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I.  EXECUTIVE  SUMMARY 


This  report  summarizes  accomplishments  and  overall  progress  made  during  the 
period  of  May  1,  1987  to  December  31,  1988.  It  is  the  final  project  report  for  the 
research  project  sponsored  by  the  Office  of  Naval  Research  under  Contract  N00014-87- 
k-0284.  It  enlists  publications  and  professional  activities  of  the  Principal  Investigators 
during  this  period.  A  selected  subset  of  the  Principal  Investigators’  publications  is 
included  in  the  Technical  Appendices. 

Recent  advances  in  integrated  circuits  and  signal  processing  technology  have 
prompted  the  need  of  algorithms  with  higher  degrees  of  modularity  and  pipelineabil- 
ity ,  resulting  in  higher  degrees  of  concurrency  and  machine  perception.  It  is  generally 
believed  in  the  research  community  that  higher  degrees  of  concurrency  and  machine 
perception  will  bring  forth  many  breakthroughs  in  many  signal  processing  areas,  par¬ 
ticularly  in  adaptive  systems,  speech  and  image  processing  and  recognition. 

The  objective  of  this  research  project  is  to  develop  an  adaptive  signal  processing 
architecture  with  important  features  such  as  modularity,  pipelineability ,  and  intelli¬ 
gent  use  of  information.  The  ground  work  upon  which  this  research  project  rests  is 
a  recursive  parameter  estimation  algorithm,  i.e.,  the  so-called  OEE  algorithm,  which 
features  a  discerning  update  strategy.  This  discerning  update  is  in  sharp  contrast  to 
the  continual  update  used  by  most  existing  algorithms. 

--  The  estimation  algorithm  has  been  developed  with  a  set-theoretic  framework.  In 
particular,  starting  with  the  assumption  that  the  underlying  noise  (of  the  system 
being  studied)  is  bounded  in  magnitude,  a  recursive  least-sqaures  type  of  estimation 
algorithm  was  obtained  with  a  discerning  update  strategy.  An  important  outcome  of 
such  discerning  updates  is  that  the  resulting  algorithm  can  be  implemented  with  two 
modules:  an  information  processor  followed  by  an  updating  processor.  .  The  former 
decides  whether  an  update  is  needed,  and  the  decision  is  based  on  the  evaluation  of 
the  ’’  information  quality ”  of  the  input  data,  the  prediction  error ,  and  the  noise  bound. 
It  is  essential  that  the  information  evaluation  involves  very  little  computational  effort, 
which  is  the  case  here.  The  latter  then  updates  the  parameter  estimates  when  the 
information  processor  decides  that  such  is  needed. 

simulation  results  have  shown,  in  general,  that  only  less  than  20%  of  the  input 
data  are  used  to  update  the  parameter  estimates.  This  >s  tr”e  for  most  practical 
systems  that  can  be  modeled  by  autoregressive  processes  with  exogeneous  inputs 
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(ARX)  or  autoregressive  moving  average  (ARMA)  processes  whose  order  is  less  than 
ten. 

Conceptually,  thanks  to  the  modularity  and  to  the  fact  that  only  less  than  20% 
of  input  data  are  used  to  update  the  parameter  estimates,  an  adaptive  signal  pro¬ 
cessing  network  may  be  constructed.  The  network  will  consist  of  a  number  of  such 
modular  recursive  estimators,  each  of  which  is  comprised  of  two  modules,  namely,  the 
information  evaluator  and  the  updating  processor.  As  such,  the  idling  time  of  both 
the  information  evaluator  and  the  updating  processor  can  be  reduced,  ihus  the  data 
throughput  rate  will  be  increased.  In  addition,  the  reliability  of  signal  processing  can 
be  improved  greatly.  In  essence,  this  type  of  adaptive  networks  will  be  able  to  pro¬ 
cess  multi-channel  adaptation  and  filtering ,  improving  reliability  and  data  throughput 
rates.  One  of  the  important  applications  for  this  is  adaptive  array  processing  in  sonar 
systems. 

In  this  project,  several  fundamental  issues  associated  with  the  development  of  this 
type  of  networks  are  investigated.  These  issues  include  modelling,  statistical  analysis 
of  the  underlying  estimation  algorithm,  theoretical  studies  on  convergence  issues,  and 
finite  word-length  effects. 

Modelling  of  the  pipelined  concurrent  networking  structure  was  investigated  with 
the  aid  of  discrete  event  models.  In  particular,  we  have  devised  a  time-sharing  type 
of  network  structure  and  employed  Petri  nets  to  model  the  data  flow.  Timing,  syn¬ 
chronization,  and  routing  of  the  underlying  arithmetic  operations  and  propagation  of 
data  are  shown  to  be  easily  modeled  by  timed  Petri  nets.  It  was  seen  that  the  virtues 
of  the  adaptive  network  structure  include  improvements  of  data  throughput  rates, 
cost  reduction  of  signal  processing  hardware,  and  improvements  of  reliability. 

Statistical  analysis  was  also  performed  for  the  estimation  algorithm  in  its  appli¬ 
cation  to  ARX  parameter  estimation.  This  study  is  important  in  the  development  of 
routing  strategy  of  data  flow  and  performance  analysis  of  the  network.  Asymptotic 
unbiasedness  of  the  estimates  and  boundedness  of  estimation  error  covariance  have 
been  established  under  very  general  conditions,  such  as  whiteness  and  zero  mean  for 
the  underlying  noise. 

An  extension  of  the  OBE  algorithm  (the  EOBE  algorithm)  has  been  developed  to 
deal  with  ARMA  parameter  estimation.  The  results  are  particularly  useful  for  appli¬ 
cations  such  as  spectral  estimation.  The  algorithm  retains  the  feature  of  discerning 
update  strategy  and  is  shown  to  yield  bounded  a  posteriori  prediction  errors  without 
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the  premise  of  the  strictly  positive  real  (SPR)  condition.  This  is  in  sharp  contrast  to 
the  conventional  algorithms  such  as  the  extended  least-squares  (ELS)  algorithm. 

It  is  well  known  that  the  SPR  condition  plays  a  crucial  role  in  the  analysis  of  the 
ELS  algorithm.  Specifically,  without  the  SPR  condition,  the  ELS  can  only  be  shown 
to  have  bounded  mean-square  a  posteriori  prediction  error.  In  addition,  simulation 
results  have  shown  that  performance  of  the  EOBE  algorithm  is  comparable  to  that  of 
the  ELS,  in  terms  of  the  asymptotic  bias  of  the  estimates,  a  posteriori  prediction  error, 
and  a  priori  prediction  error;  even  though  the  EOBE  used  only  less  than  20%  of  the 
input  data  to  update  the  estimates.  An  important  point  here  is  that  implementation 
of  the  EOBE  requires  only  that  the  underlying  noise  is  bounded  in  magnitude.  The 
development  of  this  EOBE  algorithm  will  help  us  in  the  application  of  the  adaptive 
network  to  spectral  estimation. 

Implementation  on  finite  word-length  processors  has  been  studied  via  simulations. 
In  particular,  the  effects  of  roundoff  error  accumulation  and  numerical  stability  were 
studied  with  fixed  point  simulations.  Based  on  these  preliminary  results,  it  has  been 
seen  that  the  OBE  and  the  EOBE  appear  to  be  superior  to  the  RLS  and  the  ELS, 
respectively.  One  of  the  possible  reasons  for  such  encouraging  results  is  the  discerning 
update  strategy  which  updates  parameter  estimates  less  frequently,  thereby  accumu¬ 
lates  less  roundoff  errors.  Another  reason  is  imbedded  in  the  update  equations  which 
may  require  more  detailed  analysis.  Nevertheless,  these  results  further  verify  our 
conjecture  that  eliminating  redundant  use  of  information,  contained  in  the  received 
data,  would  reduce  the  effects  of  roundoff  errors. 

In  summary,  our  studies  showed  that  the  proposed  adaptive  network  structure 
is  viable.  The  concept  of  discerning  update,  on  which  the  OBE  algorithm  and  its 
extension  (the  EOBE  algorithm)  are  based,  is  appealing  in  practice  as  well  as  in 
theory.  The  progress  made  in  the  project  not  only  demonstrated  the  OBE  algorithm 
as  a  viable  scheme  for  parameter  estimation,  but  also  opened  some  new  avenues  in 
the  subject  area.  The  algorithm  is  applicable  to  a  large  class  of  systems  for  which 
no  a  priori  statistical  information  is  available  and  the  resulting  parameter  estimates 
are  assured  of  good  quality.  The  only  assumption  made  in  the  implementation  and 
analysis  of  the  algorithms  is  boundedness  on  the  magnitude  of  the  underlying  noise. 
Due  to  the  discerning  update  strategy  of  recursive  estimation,  the  resulting  adaptive 
system  features  a  higher  level  of  machine  perception ,  higher  degrees  of  modularity, 
and  is  less  susceptible  to  finite  word-length  processing. 
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In  addition  to  the  above  accomplishments,  our  experience  in  adaptive  signal  pro¬ 
cessing  using  algorithms  with  discerning  updates  have  facilitated  studies  on  artificial 
neural  networks.  In  particular,  a  learning  algorithm  in  neural  networks  with  selective 
updates  has  been  developed  as  an  additional  accomplishment  of  the  project.  The 
resulting  algorithm  is  the  so-called  selective  update  back  propagation  (SUBP). 

Recent  resurgence  in  the  field  of  artificial  neural  networks  has  called  for  much 
attention  to  learning  algorithms.  Learning  capability  of  artificial  neural  networks 
may  essentially  be  responsible  for  distinguishing  those  networks  from  conventional 
computing  and  data  processing  methodologies. 

A  commonly  studied  learning  algorithm  is  the  so-called  back  propagation  (BP) 
algorithm  for  perceptron-like  networks.  The  back  propagation  algorithm  is  believed  to 
have  the  most  potential  to  date  for  generalization.  It  bears  a  great  deal  of  similarity  to 
the  least-mean-squares  (LMS)  algorithm  in  adaptive  systems.  It  updates  continually 
regardless  of  the  benefit  of  such  updates. 

As  shown  by  many  simulation  studies,  learning  with  BP  appears  to  be  rather 
inefficient  as  the  algorithm  often  fails  to  converge.  As  a  matter  of  fact,  in  some 
practical  pattern  classification  problems,  it  can  be  shown  that  using  BP  to  train  the 
network  will  take  it  farther  and  farther  away  from  the  desired  results.  We  suspect 
that  the  continual  updating  used  by  the  BP  algorithm,  which  results  in  redundant 
use  of  data  that  may  not  be  informative,  is  a  handicap  of  the  learning  procedure. 

The  SUBP  algorithm,  employing  the  concept  of  discerning  updates,  appears  to 
have  good  potential  for  circumventing  these  difficulties.  Further  investigations  on 
SUBP  will  be  needed. 
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IT.  INVITED  LECTURES/SEMINARS 

II. A.  Invited  Seminars  at  Academic  Institutions: 

•  Y.  F.  Huang: 

1.  “Intelligent  Learning  of  Artificial  Neural  Networks,”  June  27,  1988.  Institute 
of  Information  Science,  Sinica  Academia,  Taiwan. 

2.  “Discerning  Update  in  Recursive  Parameter  Estimation,”  Nov.  3,  1988.  De¬ 
partment  of  Electrical  Engineering,  Princeton  University,  Princeton,  N.J. 

•  R.  Liu: 

1.  “Neural  Networks  -  A  New  Breed  of  Computers”,  June  27,  1988.  Institute  of 
Information  Science,  Sinica  Academia,  Taiwan. 

2.  “Neural  Computing”,  October  10,  1988.  Fu-dan  University,  Shanghai,  China. 

3.  “Neural  Networks:  Architecture  and  Learning”,  October  13-14,  1988.  Chinese 
Academy  of  Science,  Beijing,  China. 

4.  “Neural  Networks:  Architecture  and  Learning,  October  21,  and  24,  1988.  Na¬ 
tional  Taiwan  University,  Taipei,  Taiwan.  • 


II.B.  Invited  Conference  Presentations 

•  Y.  F.  Huang: 

1.  Y.  F.  Huang  and  R.  Liu, 

“A  High  Level  Parallel-Pipelined  Network  Architecture  for  Adaptive  Signal 
Processing”,  Twenty-Sixth  IEEE  Conf.  on  Decision  and  Control ,  Los 
Angeles,  CA,  December  9-11,  1987. 

2.  A.  K.  Rao  and  Y.  F.  Huang, 

“Recursive  ARMA  Parameter  Estimation  with  a  Discerning  Update  Strategy 
-  Finite  Precision  Effects,”  ComCon-88,  Advances  in  Communication  and 
Control  Systems ,  Baton  Rouge,  LA,  October  19-21,  1988. 

3.  V.  C.  Soon  and  Y.  F.  Huang, 

“Applications  and  Analysis  of  Second  Order  Artificial  Neural  Networks.” 
Twenty-Seventh  IEEE  Conf.  on  Decision  and  Control ,  Austin,  TX, 

December  7-9,  1988. 

•  R.  Liu: 

1.  Qiu  Huang  and  R.  Liu, 

“Diagnosis  of  Piecewise- Li  near  Circuits”,  IEEE  International  Symposium  on 
Circuits  and  Systems,  Philadelphia,  PA,  May  4-6,  1987. 
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2.  Vijay  Raman  and  R.  Liu, 

“Stabilization  of  Chaos  -  An  Algebraic  Theory”,  IEEE  International 
Symposium  on  Circuits  and  Systems,  Helsinki,  Finland,  June  6-8,  19S8. 

3.  D.  Graupe  and  R.  Liu, 

“Neural  Networks  for  Medical  Signal  Processing”,  27th  IEEE  Conf.  on 
Decision  and  Control,  Los  Angeles,  CA,  December  9-11,  1988. 


III.  PROFESSIONAL  ACTIVITIES 


*  Y.  F.  Huang: 

1.  Chaired  a  Session  entitled  “Estimation,”  at  the  22nd  Annual  Conference  on 
Information  Sciences  and  Systems,  Princeton  University,  Princeton,  N.J., 
March,  1988. 

2.  Organized  a  Special  Session  entitled,  “Adaptive  Signal  Processing,”  at  the 
ComCon-88,  Advances  in  Communications  and  Control  Systems,  Baton 
Rouge,  LA,  October  1988. 

•  R.  Liu: 

1.  Organized  a  special  session  entitled.  "Advanced  Topics  in  Nonlinear  Theory. 
Testing  and  Robotics”,  IEEE  International  Symposium  on  Circuits  and 
Systems,  Philadelphia,  PA,  May  1987. 

2.  Organized  two  special  sessions  entitled.  "Dynamics  of  Discrete  Events 
Systems  I  &  II”,  IEEE  Conf.  on  Decision  and  Control,  Los  Angeles,  CA. 
Dec.  1987. 

3.  Organized  a  special  session  entitled.  "Nonlinear  Behavior  of  Neural 
Networks”,  IEEE  International  Symposium  on  Circuits  and  Systems, 
Helsinki,  Finland,  June  7-9,  1988. 

4.  Organized  a  special  session  entitled.  "Advanced  Nonlinear  Theory”, 
International  symposium  on  Circuits  and  Systems.  Helsinki,  Finland,  June 
7-9,  1988. 

•5.  Associate  Editor,  IEEE  Transactions  on  Circuits  and  Systems,  June  1,  1985  - 
May  31,  1987. 

6.  Member  of  Program  Committee,  19*8  International  Symposium  on 
Communication  and  Control,  Baton  Rouge.  LA,  Oct.  1988. 

7.  Member  of  Program  Committee.  IEEE  International  Symposium  on  Circuits 
and  Systems,  Portland,  OR,  May  9-11.  1989. 

8.  Member  of  Administrative  Committee  of  the  IEEE  Circuits  and  Systems 
Society,  1987-1989. 
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IV.  THESES  AND  DISSERTATIONS  DIRECTED 


The  following  is  a  list  of  masters  theses  and  doctoral  dissertations  that  are  spon¬ 
sored,  in  part,  by  the  grant. 

•  Y.  F.  Huang: 

1.  Ashok  K.  Rao  (M.S.) 

Thesis  Title:  Applications  and  Extensions  of  a  Novel  Recursive  Estimation 
Algorithm  with  Selective  Updating.  (Graduation  Date:  August, 1987) 

2.  James  C.  Francis  (M.S.) 

Thesis  Title:  Distributed  Detection:  A  Totally  Nev man- Pearson  Optimal  De¬ 
sign  Scheme.  (Graduation  Date:  January,  1988) 

3.  Shih-Chi  Huang  (M.S.) 

Thesis  Title:  Learning  with  Selective  Updates  for  Artificial  Neural  Networks. 
(Graduation  Date:  August,  1988) 

4.  Victor  C.  Soon  (M.S.) 

Thesis  Title:  On  the  Capacity  of  Perceptron-Like  Neural  Networks.  (Gradua¬ 
tion  Date:  January,  1989) 

5.  Ashok  K.  Rao  (Ph.D.)  Dissertation  Title:  Recursive  Estimation  for  ARM  A 
Parameters  with  Selective  Updates.  (Expected  Graduation  Date:  August  1989). 

•  R.  Liu: 

1.  Qiu  Huang  (Ph.D.) 

Thesis  Title:  A  Novel  Approach  to  CAD  for  Large-Scale  Piecewise  Linear  Sys¬ 
tems,  (Graduation  Date:  May,  1988). 

2.  Zhi-hong  Chai  (M.S.) 

Thesis  Title:  Complexity  of  Inversion  of  Large  Matrices.  (Expected  Graduation 
Date:  December,  1989). 


V.  PUBLICATIONS 

V.A.  Invited  Conference  Publications 

*1.  Y.  F.  Huang, 

“A  Modular  Recursive  Estimator  for  Adaptive  Signal  Processing”,  Proc.  of 
the  Fifth  International  Conf.  on  Systems  Engineering ,  Dayton.  Ohio.  pp. 
481-484,  1987. 


*2.  Y.  F.  Huang  and  R.  Liu 

“A  High  Level  Parallel-Pipelined  Network  Architecture  for  Adaptive  Signal 
Processing”,  Proc.  of  the  26th  IEEE  Conf.  on  Decision  and  Control,  Los 
Angeles,  CA,  pp.  662-667,  1987. 

*3.  A.  K.  Rao  and  Y.  F.  Huang 

■‘Recursive  ARMA  Parameter  Estimation  with  a  Discerning  Update  Strategy 
-  Finite  Precision  Effects,”  Proc.  of  ComCon-88,  Advances  in 
Communication  and  Control  Systems,  Baton  Rouge,  LA,  1988. 

*4.  V.  C.  Soon  and  Y.  F.  Huang 

“Applications  and  Analysis  of  Second  Order  Artificial  Neural  Networks,” 
Proc.  of  the  27th  IEEE  Conf.  on  Decision  and  Control ,  pp.  348-349,  Austin. 
TX,  Dec.,  1988. 

5.  Qiu  Huang  and  R.  Liu 

“Fault  Diagnosis  of  P'ecewise-Linear  Systems”,  Proc.  IEEE  ISCAS,  pp. 
418-421,  May  4-7,  1987. 

6.  V.  Raman  and  R.  Liu 

“Stabilization  of  Chaos  -  An  Algebraic  Theory”.  Proc.  ISCAS,  pp.  1981-82. 
June  7-9,  1988. 

7.  D.  Graupe  and  R.  Liu 

“Applications  of  Neural  Networks  to  Medical  Signal  Processing”.  Proc.  rDC. 

pp.  343-347.  1988 

*8.  R.  Liu,  L.  Tong  and  J.  Deng 

“An  On-line  Learning  Neural  Network  for  Discrimination  of  Walk  Functions 
in  Paraplegics”,  1989  IEEE  ISCAS.  (to  appear). 

9.  R.  Liu,  L.  Tong  and  L.  Zhang 

“A  Necessary  and  Sufficient  Condition  for  the  Testability  of  Hybrid 
Circuits”,  1989  IEEE  ISCAS,  (to  appear). 

V.B.  Refereed  Conference  Publications 
*1.  V.  C.  Soon  and  Y.  F.  Huang 

“Artificial  Neural  Networks  with  Second  Order  Discriminant  Functions”. 
Proc.  of  the  22nd  Annual  Conf.  on  Infoi-m.  Sciences  and  Systems ,  pp. 
308-313,  Princeton,  N.J.,  March  1988. 

*2.  A.  K.  Rao  and  Y.  F.  Huang 

“Statistical  Properties  of  a  Novel  Recursive  Estimation  Algorithm  with 
Information- Dependent  Updating”.  Proc.  of  1988  International  Conf.  on 
Acoustics,  Speech  and  Signal  Proc.,  pp.  2436-2439.  New  York,  N.Y.  1988. 


*3.  A.  K.  Rao  and  Y.  F.  Huang 

“ARMA  Parameter  Estimation  Using  a  Novel  Recursive  Estimation 
Algorithm  with  Selective  Updating,”  abstract  appeared  in  Abstracts  of 
Papers,  1988  IEEE  International  Symp.  on  Inform.  Theory,  pp.  74-75, 

Kobe,  Japan,  1988. 

4.  M.  Ammar  and  Y.  F.  Huang 

‘‘Qualitative  Analysis  of  Quantizers  for  Signal  Detection,”  pp.  i 73-182,  C.  I. 
Byrnes,  C.  F.  Martin,  and  R.  E.  Sacks,  eds.,  Elsevier  Science  Pub.  B.  V., 
1988. 

*5.  A.  K.  Rao,  Y.  F.  Huang,  and  S.  Dasgupta 

■‘An  Extended  OBE  Algorithm  for  ARMA  Parameter  Estimation,”  Proc. 

26th  Annual  Allerton  Conf.  on  Commun.,  Control,  and  Computing ,  pp. 
229-238,  University  of  Illinois,  Urbana-Champaign,  September  28-30,  1988. 

*6.  Qiu  Huang  and  R.  Liu 

“  \  New  Efficient  Algorithm  for  Analysis  of  Piecewise-Linear  Circuits”,  1989 
IEEE  ISCAS,  (to  appear). 

V.C.  Journal  Publications 

*1.  A.  K.  Rao,  Y.  F.  Huang  and  S.  Dasgupta 

"‘ARMA  Parameter  Estimation  Using  a  Novel  Recursive  Estimation 
Algorithm  with  Selective  Updating,”  submitted  for  publication,  revised. 

2.  M.  Ammar  and  Y.  F.  Huang 

"An  M-Interval  Quantizer- Detector  Based  on  Statistical  Moments.” 
submitted  for  publication,  revised. 

3.  Y.  F.  Huang  and  E.  Y.  Zhang 

"Echo  Cancellation  in  Full-Duplex  Systems  Using  a  Modified  Stochastic 
Gradient  Estimation  Method”,  submitted  for  publication. 

*4.  S.  C.  Huang  and  Y.  F.  Huang 

“Learning  with  a  Selective  Update  Strategy  for  Neural  Networks,”  submitted 
for  publication. 

*5.  Qiu  Huang  and  R.  Liu 
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Abstract  -  This  paper  addresses  a  newly  developed  recursive  estimation  algorithm  which 
features  a  discerning  update  strategy  for  parameter  estimates.  This  is  in  contrast  to  conven¬ 
tional  algorithms’  continual  update  of  parameter  estimates.  Of  particular  interests  here  is  its 
higher  degree  of  modularity,  as  a  result  of  the  discerning  update  strategy,  and  its  potential  for 
pipelined  adaptive  signal  processing  architecture. 

INTRODUCTION 

Effectiveness  of  recursive  estimation  is  known  to  be  extremely  critical  in  the  design  and 
implementation  of  adaptive  systems.  Issues  such  as  accuracy  of  parameter  estimates,  speed  of 
convergence,  and  numerical  stability  of  recursive  estimation  algorithms  have  received  a  great 
deal  of  attention  throughout  the  years  with  numerous  books  and  journal  articles  on  these  sub¬ 
jects,  see,  e.g.,  [1-7].  Recently,  due  to  the  advent  of  the  very  large  scale  integrated  (VLSI)  cir¬ 
cuits  technology,  there  have  been  intense  interests  on  constructing  more  efficient  signal  proces¬ 
sors  to  be  compatible  with  such  technology  (8-11).  Algorithms  which  can  be  implemented 
with  higher  degrees  of  modularity  and  pipelineability  (hence  higher  degrees  of  concurrency) 
will  be  very  much  in  demand.  In  addition,  to  further  improve  the  effectiveness  of  data  pro¬ 
cessing,  estimation  algorithms  which  can  extras  information  intelligently  will  prove  to  be  prac¬ 
tically  appealing  for  adaptive  signal  processing  ./stems.  There  exist  immediate  applications  of 
such  algorithms  to  many  modem  communication  systems,  such  as  AD  PCM  for  speech  orocess- 
ing  and  adaptive  echo  cancellation  in  telecommunications. 

A  common  feature  of  most,  if  not  all,  existing  recursive  estimation  algorithms  is  the  con¬ 
tinual  update  of  parameter  estimates  without  regard  to  the  benefits  provided.  Thus  even  if  a 
new  measurement  contains  no  fresh  information  and  even  if  its  use  fails  to  result  in  any 
improvement  in  the  quality  of  estimation,  the  update  does  not  cease.  A  classical  argument  for 
that  is  to  assume  that  the  underlying  signal  sequence  is  an  independent  one.  It  thus  implies 
that  some  innovative  information  can  be  acquired  from  each  new  measurement.  In  practice, 
this  assumption  often  fails  to  hold  as  data  from  many  natural  sources,  such  as  speech  processes 
and  underwater  signal  processes,  have  exhibited  dependent  characteristics.  Another  issue  of 
much  concern  is  that  of  numerical  stability.  Experience  in  digital  filter  theory  over  the  last  two 
decades  has  shown  that  special  attention  must  be  paid  to  problems  arising  from  finite  word- 
length  implementations  [91.  Typical  such  problems  are  propagation  of  round-off  errors, 
overflow  oscillations,  and  limit  cycles.  The  continual  update  strategy  usually  involves  more 
computational  complexity,  thus  is  more  likely  to  result  in  numerical  instability.  In  addition, 
the  time  delay  due  to  updating  may  result  in  deviations  from  the  presumed  model  on  which  the 
estimation  process  is  based.  This  not  only  destroys  the  purpose  of  estimation,  but  could  also 
cause  instability  of  the  entire  system. 

In  short,  the  continual  update  strategy  of  existing  recursive  algorithms  is  not  only  redun¬ 
dant,  but  is  often  detrimental.  Recently,  a  recursive  estimation  algorithm  with  a  discerning 
update  strategy  has  been  developed  (12-15).  The  fundamental  concept  of  the  algorithm  is 
based  on  the  realization  that,  in  reality,  not  every  received  datum  contains  sufficient  innovative 
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information  to  yield  improvements  on  the  parameter  estimates.  Thus  at  every  data  point,  a 
decision  is  made  on  whether  the  received  data  contains  enough  fresh  information  to  improve 
the  parameter  estimates.  As  shown  in  Fig.  1,  the  resulting  estimator  consists  of  two  modules, 
an  information  processor  followed  by  an  updating  processor.  The  former  evaluates  the  infor¬ 
mation  content  of  the  received  data  and  decides  whether  an  update  is  necessary.  The  updating 
processor  thus  computes  a  new  parameter  estimate  only  when  the  information  processor 
decides  that  an  update  is  needed. 

PROBLEM  FORMULATION 

Many  processes  commonly  encountered  in  the  area  of  digital  signal  processing  can  be 
formulated  by  the  following  autoregressive  exogeneous  input  (ARX)  model: 

*  m 

yn  =  £  a,  yk„  +  X  b,uk-,  +  vk  (1) 

t=l  /= 0 

The  above  equation  can  be  simplified  as 

y* =  e"r  xk +  v»  (2) 

where  0’r  *  [a^ . a„  b0,  b, . bj  and  xTk  [>t_,  ....  yk_„  ut ....  ut_J.  The  problem 

of  recursive  estimation  is  essentially  concerned  with  the  evaluation  of  an  estimate  of  0',  at 
every  time  instant  k,  with  the  given  measurement  and  the  estimate  at  time  k-\. 

The  algorithm  proposed  in  [12]  and  its  extensions  [14,15]  are  derived  on  the  basis  of  a 
boundedness  assumption  on  vk.  In  particular,  assuming  that 


vi  <  y  for  all  k. 

(3) 

then  (2)  and  (3)  together  yield 

(y*  -  «'7  <  r 

(4) 

Let  Sk  be  a  subset  of  R't4m+]  defined  by 

Sk  =  (0:  (y*  -  0r  xk)2  <  y2,  y  e  R~m*' } 

(5) 

Note  that  Sk  is  a  convex  polytope  in  RK4m*\  At  any  instant  k,  consider  the  intersection  of  the 

sequence  of  the  polytopes  . Sk.  It  must  contain  the  modeled  parameter  0*.  However, 

formulation  of  this  intersection  set  is  analytically  complex.  A  better  alternative  is  to  consider 
ellipsoids  which  bound  it.  Clearly,  such  ellipsoids  must  also  contain  0*. 

Another  type  of  processes  also  commonly  encountered  can  be  modeled  by  the  following 
autoregressive-moving-average  exogeneous  input  (ARM AX)  model: 

y*  =  «ty*-t  +  ■  •  •  +  a«y*-,  +  b|«*-i  +  t>muk.m  +  V*  +  c,vt_,  +  ■  •  ■  +  crv*-„  (6) 

where  {v4}  is  a  white  noise  sequence.  Similarly  to  (2),  (6)  can  be  rewritten  as 

V*  =  yk  -  0'T*'t  (7) 

where  0’r  £  [a\,  ....  a„b\  ,  ....  b„  ct,  ....  cr],  is  the  vector  of  true  parameters.  At  time  k  if  an 
estimate  of  0’  is  available,  vk  could  be  estimated  by  ek  according  to: 

ek  =  yk-  Q'Tkx\  (8) 

with 

x * =  (  y*-i.  •■■■  y*-«  «t-i.  .  uk.m,ek.\, ....  ck-r)T 

and  9'*  being  the  estimate  of  the  true  parameter  0*.  As  such,  one  may  obtain  analogous 


formulations  for  the  convex  polytope  as  (5),  as  well  as  for  the  bounding  ellipsoids. 

THE  ALGORITHM 

Recursive  algorithms  have  been  derived  based  on  both  the  ARX  and  the  ARMAX 
models.  The  recursive  algorithm  is  initialized  with  a  sufficiently  large  ellipsoid  which  covers 
all  possible  values  of  6*.  After  (y1(  xt)  is  acquired,  one  is  to  find  an  ellipscid  which  bounds 
the  intersection  of  the  initial  ellipsoid  and  and  which  is  in  some  sense  "optimaT.  Such  an 
ellipsoid  is  denoted  by  E\.  By  the  same  token,  one  can  proceed  to  obtain  a  sequence  of 
optimal  bounding  ellipsoids  (OBE)  {£*}.  The  estimate  for  6\  at  the  k0'  instant,  is  defined  to 
be  the  center.  Thus  the  problem  of  recursive  estimation  is  converted  to  one  of  finding  a 
sequence  of  OBE’s.  The  striking  feature  of  this  procedure  is  that,  in  general,  the  OBE  needs 
to  be  updated  only  when  a  new  observation  set  intersects  it  in  such  a  way  that  the  resulting 
intersection  set  is  "significantly  smaller  than  the  previous  one.  As  shown  in  [12],  the  decision 
to  update  is  based  on  the  result  of  the  optimization  procedure.  This  essentially  results  in  an 
"information  evaluation  procedure",  as  shown  in  Fig.  1. 

The  OBE  may  be  chosen  to  be  the  one  which  has  the  minimum  volume  [12,13].  On  the 
other  hand,  one  may  also  choose  the  OBE  as  the  one  for  which  a  normalized  bound  on  the 
estimation  error  is  minimized  [14].  The  information  evaluation  procedure  involves  much  less 
computation,  compared  to  updating  of  the  parameter  estimates.  In  fact,  the  information  evalua¬ 
tion  of  [14]  involves  even  less  computation  than  its  counterpart  of  [12]. 

DISCUSSIONS 

According  to  our  simulation  experience,  only  20%  of  received  data  are  needed  for  updat¬ 
ing  estimates  and  the  results  are  as  accurate  as  those  of  the  recursive  least-squares.  This  is  true 
for  almost  all  cases  in  which  the  number  of  parameters  to  be  estimated  is  less  than  ten.  This 
result  demonstrates  tremendous  redundancy  involved  in  conventional  recursive  estimation  alg- 
rithms.  Other  features  of  the  algorithm  include  a  better  tracking  capability  [14]  for  slowly 
time-varying  parameters,  as  well  as  cessation  of  updating  thus  improves  numerical  stability  and 
the  accuracy  of  the  final  estimates 

As  a  result  of  the  modular  structure,  there  exists  a  great  potential  for  constructing  a  pipe¬ 
lined  architecture.  Furthermore,  due  to  the  simplicity  of  computation  needed  by  the  informa¬ 
tion  evaluation  procedure,  one  may  be  able  to  increase  significantly  the  data-throughput  rate  by 
constructing  a  time-sharing  type  of  processor  networks.  This  is  a  subject  ol  some  on-going 
research  projects. 
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Abstract 

This  paper  investigates  a  recently  developed  modular  recur¬ 
sive  estimation  (MRE)  [1,2]  algorithm  using  discrete  event 
models  (DEM).  It  also  proposes  a  concurrent,  pipelined  adaptive 
signal  processing  architecture  based  on  parallel  networking  of 
these  MRE's.  The  main  feature  of  the  MRE  is  a  discerning 
update  srrategy  for  parameter  estimates,  in  contrast  to  the  con¬ 
tinual  update  strategy  of  conventional  algorithms.  Not  only  does 
this  discerning  update  strategy  result  in  a  higher  degree  of  modu¬ 
larity,  but  also  does  it  facilitate  more  effective  use  of  data  infor¬ 
mation. 


I.  INTRODUCTION 

The  growing  need  of  high  speed  digital  processing  of  large 
volumes  of  data  has  resulted  in  a  great  deal  of  interest  in  signal 
processing  algorithms  which  possess  important  features  of  modu¬ 
larity,  pipelineability,  regularity,  and  flexibility.  The  advent  of 
the  very  large  scale  integrated  (VLSI)  circuit  technology  has 
made  available  low  cost  signal  processors  with  higher  degrees  of 
concurrency  in  computation.  Design  and  implementation  of 
adaptive  signal  processing  algorithms  will  undoubtedly  benefit 
from  such  progress. 

Much  work  has  been  done  to  improve  modularity  and  con¬ 
currency  in  the  computational  aspects  of  signal  processing  algo¬ 
rithms,  see,  e.g.,  [3-7],  Little  was  done  to  devise  modular  algo¬ 
rithms  at  the  "higher"  level.  Recently,  Huang  [1,2]  proposed  a 
recursive  estimation  algorithm  which  features  a  modular  struc¬ 
ture,  as  depicted  in  Fig.l.  In  particular,  the  process  of  recursive 
estimation  is  carried  out  in  two  steps:  information  evaluation  of 
the  received  data  first,  and  then  updating  of  parameter  estimates 
The  latter  proceeds  only  if  the  former  decides  that  an  update  of 
parameter  estimates  is  necessary.  In  addition  to  higher  degrees 
of  concurrency,  this  sort  of  schemes  also  have  the  benefit  of 
better  numerical  stability. 

A  common  feature  of  most,  if  not  all,  existing  recursive 
estimation  algorithms  is  the  continual  update  of  parameter  esti¬ 
mates  without  regard  to  the  benefits  provided.  Thus,  even  if  a 
new  measurement  contains  no  fresh  information  and  even  if  its 
use  fails  to  result  in  any  improvement  in  the  quality  of  estima¬ 
tion,  the  update  does  not  cease.  A  classical  argument  for  that  is 
to  assume  that  the  underlying  signal  sequence  is  an  independent 
one.  It  thus  implies  that  some  innovative  information  can  be 
acquired  from  each  new  measurement  In  practice,  this  assump¬ 
tion  often  fails  to  hold  as  data  from  many  natural  sources,  such 
as  speech  processes  and  underwater  signal  processes,  have  exhi¬ 
bited  dependent  characteristics. 
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In  general,  the  continual  update  strategy  results  in  algo- 
reloped  modular  recur-  rilhms  that  are  numerically  intensive.  It  is  thus  more  likely  t0 

using  discrete  event  result  in  numeneal  instability.  Furthermore,  the  time  delay  due 

rent,  pipelined  adaptive  to  uPdatin8  may  result  in  deviations  from  the  presumed  model  on 

parallel  networking  of  wh'ch  the  estimation  process  is  based.  This  not  only  destroys 

MRE  is  a  discerning  the  PurPose  of  estimation,  but  could  also  cause  instability  of  the 

in  contrast  to  the  con-  entire  system. 

irithms.  Not  only  does  In  addition  to  the  virtues  discussed  above,  an  outgrowth  of 

higher  degree  of  modu-  this  modular  recursive  estimation  (MRE)  is  a  parallel-pipelined 

ctive  use  of  data  infor-  networking  structure.  Note  that  many  data  samples  will  be 

rejected  by  the  information  evaluation  (IE)  procedure.  Our  simu¬ 
lation  experience  has  shown  that,  for  lower  order  systems  (say. 
N  eighth  order),  only  20%  of  the  received  data  are  used  for  updai- 

ital  processing  of  large  inS  Parameter  estimates  and  the  results  are  as  accurate  as  those 

eal  of  interest  in  signal  of  the  recursive  least-squares.  In  fact,  this  number  could  even  be 

irtant  features  of  mL-  smaller  ,n  some  ‘PpHcations  [8], 

Ability.  The  advent  of  As  shown  in  [1,2],  it  can  be  designed  so  that  the  computa- 

circuit  technology  has  tional  complexity  of  IE  is  much  less  than  that  of  the  updating 

with  higher  degrees  of  (UPD)  procedure.  As  a  result,  both  the  IE  and  the  UPD  could 

nd  implementation  of  involve  a  good  amount  of  idle  time.  A  viable  idea  is  thus  to 

ill  undoubtedly  benefit  consider  a  parallel,  pipelined  network  configuration  of  such 

modular  estimators  to  (multiplex)  process  signals  from  multiple 
re  modularity  and  con-  channels.  In  this  case,  issues  of  data  throughput  and  those  of 

signal  processing  al"o-  routing  (or  queuing)  of  parameter  estimates  become  complicated 

o  dev.se  modular  algo-  and  have  >*  addressed. 

luang  [1,2]  proposed  a  The  purpose  of  this  paper  is  to  investigate  these  aforemen- 

, lures  a  modular  struc-  tioned  issues,  and  effectiveness  of  the  parallel,  pipelined  network 

he  process  of  recursive  architecture.  Realizing  that  data  throughput  and  routing  in  signal 

formation  evaluation  of  processing  are  sequences  of  discrete  events,  we  invesugate  the 

of  parameter  estimates.  potential  of  DEM  to  facilitate  our  study. 


II.  A  MODULAR  RECURSIVE  ESTIMATION  ALGORITHM 
This  section  summarizes  the  estimation  algorithm  proposed 
in  [1.2].  Consider  the  following  auto-regressive  exogeneous 
input  (ARX)  model: 

yt  =  £  Tt-,  +  £  b,uH  +  vt  (U 

P=t  p=0 


where  S’7  d  [ai  ...  apb0bl  ■  ■  ■  b„] 

4  »  b*-i  ■  yt-p  The  problem  of  rrxvv* 

estimation  is  essentially  concerned  with  the  evaluanon  of  as®” 
mate  of  9*,  at  every  time  instant  4,  with  the  given  measure*^ 
iypjtk)  and  the  estimate  at  time  4-1. 
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Figure  1:  A  modular  recursive  estimator. 
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The  algorithm  proposed  in  [1,2]  is  derived  on  the  basis  of  a 
boundedness  assumption  on  v*  namely, 

vl  S  y2.  for  all  k.  (3) 

parameter  estimates  resulting  from  this  algorithm  is  formulated 
as  follows: 

6*  =  9*-i  +  *t  St 


- _ l _ rp  a  f*-l  v[  'Vl  . 

Pk  l-Xt[  t-‘  ‘  l-X*  +  XtG*  1 


o*  *  (1-Xt)<ji,  +  X*  y2  - 


XtO-X^ 


(4a) 

(4b) 

(4c) 


1-Xt  +  XkGt 

where  9k  is  the  parameter  estimate  at  time  k,  5t  £  y4  -  **  Qk_{  is 
the  prediction  error,  and  Gt  £  The  variable  Xt  is 

defined  as  follows: 

(0  If  y2  2  <tj_[  +  6j  then  Xt  =  0.  (5a) 

(ii)  Otherwise,  Xt  =  min(ct,  V*);  where  (5b) 


ot  = 


a 

if  =  0 

(6a) 

1-Pt 

2 

Hh 

o 

I* 

II 

(6b) 

_J L_  fl  _-s 

1  i 

if  Pt(Gt-l)  +  1  >0 

(6c) 

\-ck 

V  l+pt(G*-l)' 

a 

if  (3t(Gt-l)  +  1  SO 

(6 d) 

with  pt  ^  (Y2  -  ai-i)/82. 

Note  that  the  decision  of  update  or  not  depends  on  the 
value  of  Xt  evaluated  by  Eqs.  (5).  In  general,  the  algorithm  can 
be  initialized  with  90=O  and  f’0=//£.  where  0  and  /  are  null  vector 
and  identity  matrix  of  appropriate  dimensions,  respectively;  and  e 
is  a  very  small  positive  real  number. 


III.  A  DEM  FOR  THE  MRE 

In  this  section,  it  is  shown  that  DEM  are  employed  to 
investigate  the  data  flow  and  timing  of  event  sequences  associ¬ 
ated  with  computation  involved  in  the  MRE.  In  particular,  Petri 
nets  are  used  to  model  events  and  states  corresponding  to  com¬ 
putation  at  three  different  levels.  These  models  provide  a  clair¬ 
voyant  picture  of  blocks  of  computation  that  can  be  earned  out 
concurrently.  They  also  show  propagations  of  data  computation 
in  information  evaluator  and  updating  processor. 

Note  that  the  models  used  here  are  the  so-called  (extended) 
timed  Petri  nets  [9-11].  In  these  Petri  net  models,  there  are 
essentially  two  types  of  transitions:  immediate  transitions  and 
timed  transitions.  In  our  study,  immediate  transitions  signify 
events  such  as  shift,  compare,  or  simply  synchronization.  On  the 
other  hand,  timed  transitions  stand  for  computations,  such  as 
additions  or  multiplications,  which  take  non-negligible  amounts 
of  time.  Depending  on  the  level  of  modeling,  this  transition  time 
may  be  deterministic  or  random.  These  models  are  shown  in 
Figs.  2-5,  in  which  ban  stand  for  immediate  transitions  whereas 
rectangles  are  timed  transitions. 

Figure  2  depicts  propagation  of  data  computation  in  IE.  Let 
ta  and  T„  denote  the  times  needed  for  each  addition  and  for  each 
multiplication  of  two  real  numbers,  respectively.  It  is  assumed 
here  that  an  unlimited  number  of  multipliers  and  adders  are 
available  concurrently  and  that  memory  access  and  data  transfer 
are  events  which  consume  no  time.  Thus,  as  shown  in  Fig.  2, 
with  concurrent  multiplications  IE  can  be  accomplished  in 
2tm  +  tJI-mc],  where  K  is  the  integer  satisfying  ic-lSIogjnSK 
with  /t=p-*y+l.  A  building  block  of  matrix-vector  multiplication 
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is  shown  in  Fig.  3,  which  also  shows  chat  such  an  operation 
elapses  rw-HCta  time  units. 

An  ETPN  model  for  a  complete  cycle  of  the  MRE  process 
is  depicted  in  Fig.  4.  Note  that  conflicting  transitions,  such  as  ru 
and  ti;,  fjj  and  tjj,  etc.,  are  used  to  model  decision  making  for¬ 
mulated  by  Eqs.  (5)  and  (6).  Thus,  given  a  set  of  data  samples, 
the  sequence  of  events  in  IE  and  UPD  and  the  corresponding 
timing  can  be  obtained  easily  from  Figs.  (4a)  and  (4b).  Note 
that  the  time  consumed  by  each  cycle  of  IE  and  UPD  is  deter¬ 
mined  provided  that  the  routing  of  data  flow  is  given. 

In  Fig.  5,  an  ETPN  is  used  to  depict  the  MRE  at  the  higher 
level.  It  is  seen  that  transitions  h  and  r3  are  conflicting,  and  that 
t4  is  a  randomly  timed  transition  whereas  t5  is  deterministically 
timed.  Routing  of  data  flow  depends  on  probabilities  of 
occurrences  of  event  r,  and  r3,  which  may  be  determined  from 
the  sampled  data  and  the  underlying  system  characteristics.  Ran¬ 
domness  of  r4  is  a  result  of  rouung  as  seen  in  Fig.  4(a).  Its 
firing  time  is  a  discrete  random  variable  which  assumes  one  of 
only  three  values,  depending  on  the  decisions  formulated  by 
Eqs. (6). 

Examining  Figs. (4)  and  (5)  reveals  that  completion  of  one 
cycle  of  IE  needs  2tm+TJ(K+l).  and  completion  of  UPD  requires 
5t^+Ta(ic-t-l)  +  T,  where  T  is  the  random  transition  time  of  r4  in 
Fig.5.  In  general,  one  can  expect  that  the  average  value  of  T  is 
in  the  order  of  3t„+icra.  Thus  if  all  sampled  data  proceed 
through  both  IE  and  UPD,  the  latter  will  become  a  bottle-neck  of 
the  estimation  process  while  the  former  will  have  a  significant 
amount  of  idte  time.  On  the  other  hand,  if  a  large  percentage  of 
data  stops  at  the  completion  of  IE,  the  UPD  may  frequently  be 
idle.  Considering  either  scenario,  one  sees  the  inefficient  use  of 
the  signal  processors.  Furthermore,  in  some  practical  cases,  the 
sampling  rate  may  be  greater  than  the  data  throughput  rate  of  IE 
and  UPD.  In  such  a  situation,  real-time  signal  processing  may 
be  seriously  handicapped.  One  of  the  possible  methods  to 
resolve  such  -’fficiency  and  handicap  is  a  parallel-pipelined 
architecture  to  -  discussed  in  the  next  section. 

IV.  DESIGNING  A  PARALLEL-PIPELINED 
ARCHITECTURE  FOR  MULTI-CHANNEL 
ADAPTIVE  SIGNAL  PROCESSING 

The  objective  of  this  section  is  to  investigate  design  metho¬ 
dologies  which  may  improve  data  throughput  rate  and  which 
may  result  in  more  efficient  use  of  signal  processors.  We  also 
hope  to  examine  the  power  of  DEM  as  an  aid  to  the  design  pro¬ 
cedure.  The  architecture  proposed  here  is  shown  in  Fig. 6.  It  is 
assumed  here  that  signals  are  received  from  iVl  different  chan¬ 
nels,  and  that  N2  IE’s  and  N2  UPD’s  are  available.  In  general,  it 
may  be  desirable  that  both  N2  and  iV3  are  less  than  ,V[  for  cost- 
effectiveness  purposes.  Nevertheless,  such  may  not  be  the  case 
if  higher  throughput  rates  and  reliability  are  more  critical. 

Consider  the  following  scenario:  All  signals  are  received  at 
the  same  sampling  rate  which  is  much  faster  than  data 
throughput  rates  of  both  IE  and  UPD,  and  only  a  small  percen¬ 
tage  of  samples  proceeds  through  the  UPD.  Assume  that  sam¬ 
pled  data  and  parameter  estimates  for  all  channels  can  be  stored 
in  buffers  A  and  B.  Thus  all  IE  and  UPD  will  be  transparent  to 
different  channels.  In  essence,  both  buffer  A  and  buffer  B  are 
queues  of  customers  which  are  sharing  and  competing  for  a  lim¬ 
ited  number  of  resources.  An  important  issue  here  will  be  the 
design  of  the  numbers  N2  and  N3  as  functions  of  N j,  sampling 
rates,  data  throughput  rates,  channel  statistics,  and  other  applica¬ 
tion  constraints.  In  practice,  this  is  often  an  issue  of  tradeoffs,  as 
opposed  to  optimality.  Also,  the  queueing  discipline  of  buffer  A 
and  buffer  B  need  to  be  designed  as  well.  In  particular,  there 
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may  be  different  priorities,  in  terms  of  the  desperateness  for 
parameter  updates,  for  different  channels.  Tracking  of  variations 
of  parameters  and  system  characteristics  should  also  be  of  much 
concern,  especially  in  the  content  of  overall  system  stability. 
These  issues  will  be  further  complicated  if  sampling  rates  for 
different  channels  are  different. 

The  virtues  of  this  type  of  signal  processing  architecture  are 
improvements  of  data  throughput  rates,  cost  reduction  of  signal 
processing  hardware,  and  improvements  of  reliability.  If  some  of 
the  IE’s  or  UPD’s  fail,  the  network  will  still  be  able  to  function 
properly.  It  would  also  benefit  from  efficient  use  of  information 
contained  in  the  received  data,  a  potential  for  interplay  of  adap¬ 
tive  signal  processing  and  artificial  intelligence. 

Accomplishments  of  these  design  problems  will  certainly 
rely  on  effectiveness  of  the  modeling  tool  available.  State-of- 
the-art  models,  such  as  finite  state  machine  (FSM),  Petri  nets 
(PN),  and  finitely  recursive  processes  (FRS),  will  be  investigated 
as  to  their  suitability  to  this  signal  processing  design  problem. 

V.  CONCLUSIONS 

The  propagation  of  data  computation  and  timing  of  a  modu¬ 
lar  recursive  estimation  algorithm  are  examined  using  a  discrete 
event  model.  The  key  to  the  MRE  algorithm  is  a  decision¬ 
making  regarding  the  information  content  of  the  received  data. 
As  a  result,  it  features  modularity  at  the  higher  level,  as  opposed 
to  the  computational  level.  This  feature  enabled  us  to  consider  a 
parallel-pipelined  architecture  for  adaptive  signal  processing  net¬ 
work  which  makes  more  efficient  use  of  signal  processors  and 
will  facilitate  achievement  of  real  time  signal  processing. 
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RECURSIVE  ARMA  PARAMETER  ESTIMATION  WITH  A  DISCERNING 
UPDATE  STRATEGY-  FINITE  PRECISION  EFFECTS 
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University  of  Notre  Dame  University  of  Notre  Dame 


Introduction.  The  performance  of  adaptive  filter  algorithms  in  finite  precision  environments 
has  received  a  lot  of  attention  in  the  past  few  years.  The  problem  is  important  because  a  practical 
implementation  of  these  algorithms  will  impose  constraints  on  the  word-length,  which  may  cause 
significant  degradation  in  the  performance.  For  example,  some  of  the  fast  least-squares  algorithms, 
though  appealing  in  theory,  have  been  found  to  be  unstable  in  finite  word- length  implementations  [1], 
Round-off  and  quantization  errors  affect  different  adaptive  algorithms  in  different  ways.  The 
accumulation  of  round-off  errors  in  the  recursive  least-squares  (RLS)  algorithm  can  cause  the  inverse 
of  the  associated  estimated  covariance  matrix  to  become  indefinite  and  the  algorithm  to  diverge  fairly 
iarly,  especially  if  the  order  of  the  filter  is  large{2,3].  This  effect  is  pronounced  if  the  data  is  ill 
conditioned,  i.e.,  the  data  autocorrelation  matrix  has  a  large  eigenvalue  spread.  On  the  other  hand,  it 
can  take  millions  of  iterations  before  the  effect  of  quantization  errors  becomes  noticeable  in  the  widely 
used  LMS  algorithm^]. 

In  this  paper  we  first  study  the  effects  of  roundoff  errors  in  a  fixed  point  implementation  of  the 
so-called  Optimal  Bounding  Ellipsoid  (OBE)  algorithm^].  This  algorithm  estimates  recursively  the 
coefficients  of  autoregressive  with  exogenous  inputs  (ARX)  processes.  One  of  the  main  features  of 
this  algorithm  is  a  discerning  update  strategy.  This  feature,  obtained  by  the  introduction  of  an 
information  dependent  updating/forgetting  factor,  yields  a  modular  structure  thereby  increasing  the 
potential  for  concurrent  and  pipelined  processing  of  signals.  The  presence  of  such  a  forgetting  factor 
also  gives  the  algorithm  the  ability  to  track  time  varying  parameters. 

The  OBE  algorithm  belongs  to  a  broad  family  of  algorithms  known  as  membership  set  parameter 
estimation  algorithms  [6], [7], [8],  These  algorithms  are  particularly  useful  when  the  statistical 
properties  of  the  noise  sequence  (v(t))  are  unknown,  but  instantaneous  bounds  on  its  magnitude  are 
available.  In  the  past  few  years,  there  has  been  a  resurgence  of  interest  in  these  algorithms.  However, 
the  key  issue  of  finite  precision  effects  has  not  received  much  attention.  We  have  found  that  in  small 
word-length  situations,  the  performance  of  the  OBE  algorithm  is  superior  to  that  of  the  RLS  algorithm 
(with  and  without  forgetting  factor).  The  EOBE  algorithm,  which  is  an  extension  of  the  OBE 
algorithm  to  ARMA  models[9]  is  studied  next  and  simulation  results  also  indicate  that  the  EOBE 
algorithm  has  better  numerical  properties  than  the  extended  least-squares  (ELS)  algorithm(see[3]for 
details  of  the  RLS  and  ELS  algorithms). 

This  paper  is  organised  as  follows:  The  first  section  introduces  the  concept  of  membership-set 
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parameter  estimation  and  describes  in  fuller  detail  the  OBE  algorithm.  The  next  section  presents  the 
extension  of  the  OBE  algorithm  for  parameter  estimation  of  ARMAX  processes. The  simulation 
procedure  and  the  simulation  results  are  presented  in  the  following  two  sections.The  paper  concludes 
with  discussions  of  the  simulation  results. 

The  OBE  Algorithm.  Membership-set  parameter  estimation  is  concerned  with  the 
determinadon  of  sets  of  parameters  which  are  consistent  with  the  measurements,  model  structure,  and 
noise  constraints.  The  model  and  noise  representations  commonly  used  are 

y(t)  -e*T<t>(t)+v(t),  lv(t)l  £  rlrt(t)  {\) 

where  (y)  is  a  sequence  of  scalar  observations,  0*  is  the  parameter  vector  to  be  identified,  <t>(t)  is  a 
n-vector  of  variables  known  at  time  t,  ( v }  is  the  noise  sequence  and  ±  r1/2(t)  are  the  time  varying  noise 
bounds.  Given  a  sequence  (y(i),$(i)|,  i*l..k,  the  optimal  membership  set 

A-nki.i  Si 

where 

S,»  (9:  ( y(i)  -  0T  $>(i)  )2  £r(i),  0eR"  ) 

From  a  geometrical  viewpoint,  S;  is  a  convex  polytope  in  Rn  and  contains  the  true  parameter  vector. 
Finding  \p°k  is  often  computationally  intractable  and  it  is  therefore  necessary  to  approximate  by 
some  set  which  approximates  it  closely  and  which  can  be  described  and  updated  economically.  The 
different  membership-set  algorithms  differ  in  the  way  the  optimal  membership  set  is  approximated  and 
in  the  method  used  to  obtain  an  optimum(in  some  sense)  set 

The  OBE  algorithm  estimates  the  coefficients  of  ARX  processes  described  by 
y(t)  *  *1  y(t-l)+..+  a„  y(t-n)  +  b0  u(t)  +  b,  u(t-l)  +..+  bm  u(t-m)  +  v(t) 
where  y(t)  is  the  output,  u(t)  is  the  input  and  v(t)  is  the  noise  contaminating  the  observations. 

The  above  equation  can  be  recast  as  . 

y(t)  -9*T<iKt)  +v(t) 
where 

*  1*1  •  *1  —  *n  *  bo  .  bt . bm  )T 

is  the  vector  of  true  parameters,  and 

<$(0  ■  ( y(t-I),  y(t-2), ..  y(t-n),  u(t),  u(t-l), ..  u(t-m) ) T 
is  the  regressor  vector.  It  is  assumed  that  the  noise  is  uniformly  bounded  in  magnitude,  i.e., 
there  exists  y0  i  0 ,  such  that 
v  *(t)  £  y0l  for  ail  t,  hence 
(y(t)  -  0^0(0  )J  <  yQ2 

Let  S,  be  a  subset  of  the  euclidean  space  R "  *  m+i ,  defined  by 
S,-  (9:  ( y(t) -  9T<I>(t))2  £  y0z  .OeR"*'"*1  ) 

The  OBE  algorithm  starts  off  with  a  large  ellipsoid.  Eg  .  in  which  contains  all  possible  values 
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of  the  modelled  parameter  0*.  After  the  first  observation  y(l)  is  acquired,  an  ellipsoid  is  found  which 
bounds  the  intersection  of  Eg  and  the  convex  polytope  S,.  To  hasten  convergence,  this  ellipsoid  must 
be  optimized  in  some  sense,  say  minimum  volume[7]  or  by  any  other  criterion[5,10].  Denoting  the 
optimal  ellipsoid  by  E{  one  can  proceed  exactly  as  before  with  the  future  observations  and  obtain  a 
sequence  of  optimal  bounding  ellipsoids(OBE)  (  E, ). 

The  center  of  the  ellipsoid  Etcan  be  taken  as  the  parameter  estimate  at  the  t-th  instant  and  is 
denoted  by  0(t).  If  at  a  particular  dme  instant  i,  the  resulting  optimal  bounding  ellipsoid  would  be  of  a 
"  smaller  size  ",  thereby  implying  that  the  data  point  y(i)  contains  some  "  information  "  regarding  ihe 
parameter  estimates,  then  the  parameter  estimates  are  updated.  Otherwise  E;  is  set  equal  to  Ei  lt  and 
the  estimates  are  not  updated.  In  essence,  the  recursive  estimator  consists  of  two  modules,  an 
information  evaluator  followed  by  an  updating  processor.  At  each  data  point,  the  received  data 
proceed  to  the  updating  processor  oniy  if  the  information  evaluator  indicates  that  some  fresh 
information  is  contained  in  the  data. 

Specifically,  let  the  ellipsoid  Et.,  at  the  (t-I)-th  instant  be  formulated  by 

E,.,-  {8:  (9-0(M))TP-‘(t-l)  (  8  -  8  (t-1) )  £  cr2  (s-1)  ) 
for  some  positive  definite  matrix  P  (t-1)  and  a  non-negative  scalar  a2  (t-1).  Then,  given  y(t).  an 
ellipsoid  which  bounds  E, ,  n  St  "tightly"  is 

{  6  :  (1  -  X, )  (0  -  8  (t-1)  )T  P'Vt-l)  (0-8  (t-1)  )  +  X,  ( y(t)  -  0Td>(t)  )2 

S  (1- \)  oVl)+  \Y0  ) 

where  the  forgetting  factor  \  satisfies  0  Z  Xj  S  a  <1,  with  a  being  a  user  chosen  upper  bound  on  the 
forgetting  factor.  The  size  of  the  bounding  ellipsoid  is  related  to  the  scalar  a3  (t-1)  and  the  eigenvalues 
ofP(t-l).  The  update  equations  for  8(t),  P(i)  and  a2!!),  derived  in  (5),  are  as  follows 


0(t)  -  0(t-l)  +  K(t)8(t) 

(2a) 

5(t)  «  y(t)  -  8T  (t- 1 )  *X*(t) 

(2b) 

XP(t-l)0(t) 

K(t)  — : - 

(2c) 

l-X,+  X(G(t) 

G(t)  -  <C*T(t)  P(t-l)  <t»(t) 

(2d) 

P(t)  -  _L  [  I .  K(t)d>T(t)  ]  P(t-l) 

(2e) 

where  <P(t)  is  the  regressor  vector  which  contains  present  and  previous  input  and  output  samples. 

The  optimal  ellipsoid  which  bounds  the  intersection  of  Et.l  and  S,  is  defined  in  terms  of  an  optimal 
value  of  X,.  For  the  OBE  algorithm  of  [S],  the  optimum  value  X*,  is  determined  by  minimization  of 
o2(t)  with  respect  to  X,  at  every  time  instant.  The  minimization  procedure  results  in  a  discerning 
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update  procedure.  In  particular,  X*(  is  set  equal  to  zero  (no  update)  if 
oHi)  +  82(0  Sy0l(t) 

On  the  other  hand,  if  (3)  is  not  satisfied,  then  the  optimal  value  of  X,  is  computed  as  follows: 

X*.  ■  min  (a,  vt ) 
where 


and 


if  8J(t)  -  0 
if  G(t)  ■  1 


1 

l-G(t) 

a 


0(0 

1+  3(0  ( G(t)  -1) 


if  p(t)  ( G(t)-l)  +  1  >0 
if  P(t)  ( G(t)  -  1)  +  1  SO 


P(0  ^ 


y2-q2(t-i) 

52(t) 


(3) 

(4) 


The  above  recursions(2),  and  the  selective  update  criterion  (3,4),  along  with  the  inidal  values 
P-,(0)  « 1 ,  0  (0)  *  0  and  o2(0)  »  1/e  with  e  «1 

form  the  basis  of  the  OBE  estimation  algorithm.  Note  that  the  OBE  algorithm  is  similar  in  form  to  the 
weighted  recursive  least-squares  (WRLS)  algorithm,  with  the  information  dependent  updating  factor 
acting  as  a  weighting  factor  on  the  observations.  Note  also  that  the  complexity  of  the  information 
evaluation  procedure  (3)  is  much  less  than  that  of  the  updating  procedure  (2). 


The  Extended  OBE  alforithm.  An  ARMA  (n.r)  process  is  of  the  form 
y(t)  ■  a,  y(t-l)+..+  ^ y(t-n)  +  w(t)  +  c,  w(t- 1)  +..+  cr  w(t-r)  (5) 

where  y(t)  is  the  output  and  w(t)  is  the  input  noise  which  is  assumed  to  be  uncorrelated  and  unknown. 
If  w(t)  is  assumed  to  be  bounded  in  magnitude  by  y0 .  then  the  OBE  algorithm  can  be  extended  to  this 
ARMA  parameter  estimation  problem,  if  estimates  of  w(t-l),  w(t-2)..,w(t-r)  are  available[9],  The 
algorithm  is  essentially  the  same  except  for  the  following  changes: 

(i)  The  regressor  vector  is  now  given  by 

O(t)  -  ( y(t-l) ,  ..  y(t-n) ,  e(t-l) ,  ...  e(t-r)  ]  T 
where 

e(t)  -  y(t)  -  0T(tWt) 

is  the  a  posteriori  prediction  error  and  the  parameter  estimate  0(1 )  is  the  estimate  of  the  aif  i  *  1 ,2,..,n 
and  Cj,  j  •  IX-Jt. 

(ii)  The  y0  in  (3),  which  is  the  upper  bound  on  the  noise,  is  replaced  by  y,  an  upper  bound  on  the 
magnitude  of  the  output  y(t). 
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It  is  easily  shown  that,  for  the  EOBE  algorithm,  minimizing  o2(t)  with  respect  to  X,  at  every  time 
instant  yields  the  same  updating  criterion  (3)  and  the  same  algorithm  for  determining  the  optimum 
value  of  the  forgetting  /  updating  factor  X*( ,  as  in  [3].  The  algorithm  thus  retains  the  discerning  update 
strategy  and  the  modular  adaptive  filter  structure. 

Simulation  Setun.  A  fixed  point  implementation  of  the  OBE  algorithm  was  simulated  by 
performing  the  operations  in  integer  arithmetic.  The  input  and  output  observations,  which  are 
generated  as  floating  point  numbers,  are  converted  to  integers  by  the  formula 
INT(  x.  2'bil  +  C.Dj .  *  >  0 

Xqu*ni  3 

INT(  x.  2ibi‘  -  0.5)  ,xS0. 

where  tbit  is  the  number  of  bits  assigned  for  the  integer  representation  of  the  fractional  part  of  the  real 
number  x.  In  the  simulations,  since  an  integer  is  stored  in  32  bits,  all  registers  and  word  sizes  are  32 
bits.  Multiplication  is  performed  by  forming  the  product  in  a  48-bit  word,  scaling  down  by  2'lbit,  and 
then  rounding  off  to  the  nearest  integer.  Inner  products  are  formed  similarly  by  accumulating  the 
products  in  a  48-bit  word,  scaling  down  and  then  rounding  off. 

The  upper  bound  a  on  the  forgetting  factor,  has  to  be  chosen  with  care  in  the  fixed  point 
implementation  of  the  OBE  and  EOBE  algorithms.  If  a  is  chosen  greater  than  0.1,  then  the  elements  of 
the  matrix  P  often  increase  rapidly  in  magnitude  and  overflows  can  occur.  The  reason  for  this  is  that  in 
the  initial  stages,  the  optimum  value  of  the  forgetting  factor  X  equals  a  fairly  often.  Consequently, 
since  1-  X  appears  in  the  denominator  of  (2e),  the  magnitude  of  the  elements  of  P  can  increase  and 
cause  overflows.  On  the  other  hand,  if  a  is  chosen  too  small  then  the  algorithm  takes  more  iterations 
to  converge  and  the  number  of  updates  increases.  A  value  of  a  *0.1  was  found  to  yield  a  satisfactory 
convergence  rate  and  inhibit  overflows  in  the  update  equation  for  P(t). 

In  addition  to  a ,  the  initial  value  a2  (0)  has  to  be  chosen  small  enough  to  prevent  overflows  in  the 
subsequent  calculations  of  X*.  This  is  because  if,  at  any  time  t,  o2  (i-l)  is  large  and  82(t)  is  small 
then  p  *  (y^o2  (t-1))/  S2(t)  can  become  a  very  large  negative  number  and  the  product  p(G-l)  can 
overflow.  However,  if  overflows  can  be  detected  and  a  saturation  value  is  used  for  P,  then  the 
calculation  of  X*  will  not  be  affected.  Since  P  is  negative  and  large  in  magnitude.  1+P(G-1)  is  a  large 
positive  or  negative  number,  depending  on  whether  G  is  greater  than  or  less  than  unity.  In  case 
l+P(G-l)  is  positive,  then  it  can  be  seen  from  (4)  that  v(  is  greater  than  unity,  and  consequently  X*= 
a.  On  the  other  hand  if  l+P(G-l)  is  negative  then  X*»  a  from  (4).  Thus  large  values  of  o2(0)  can  be 
used  if  care  is  taken  to  account  for  overflows  in  the  algorithm  for  calculating  X*.  In  our  simulations, 
the  initial  (unquantized)  value  taken  is  a2  (0)  =»  1 00. 

For  the  RLS  algorithm,  the  initial  value  P(0)  is  also  important  Since  the  bias  in  the  estimates  is 
inversely  proportional  to  P(0),  P(0)  should  be  large.  However  large  values  can  cause  the  Kalman  gain 
vector  K  to  overflow,  and  the  parameter  estimates  to  grow  exponentially  in  the  initial  stage  (11). 
Therefore  a  compromise  value  P(0)  »  10 1,  where  I  is  the  identity  matrix,  was  chosen. 
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Simulation  Results.  To  compare  the  performance  of  the  OBE  algorithm  vis  a  vis  the  RLS  and 
EWLS  algorithms,  simulations  were  performed  with  an  AR(4)  and  an  ARX(4,4)  process 
Example  1.  AR(4)  process 

y(t)  -  -0.6  y(t-l)  -1.58  y(t-2)  -  0.464  y(t-3)-  0.5576  y(t-4)  +  v(t) 

The  noise  sequence  (v(t)}  is  generated  by  a  pseudo-random  number  generator  with  a  uniform 
probability  distribution  in  [-1.0,  1.0].  The  upper  bound  y*  was  set  equal  to  1.0.  The  parameter 
estimates  were  obtained  by  applying  the  OBE  and  the  RLS  algorithms  to  500  point  data  sequences. 
Twenty  five  runs  of  the  algorithm  were  performed  on  the  same  model  but  with  different  noise 
sequences.  The  number  of  bits  used  for  the  fractional  part,  ibit,  was  varied  from  16  down  to  8  bits. 
The  average  squared  parameter  error  L  is  computed  for  each  value  of  ibit  according  to  the  formula 

L  4i(V0’)T(V8,) 

1-1 

where  9j  is  the  final  parameter  estimate  in  the  j-th  run  and  9*  is  the  true  parameter.  The  average  tap 
error  for  the  OBE,  RLS  and  exponentially  weighted  RLS(EWLS)with  forgetting  factor  X  =0.99,  is 
plotted  against  ibit  in  Fig.l.  It  can  be  seen  that  the  performance  of  the  OBE  algorithm  appears  to  be 
constant  as  the  number  of  bits  varies  from  16  to  9.  In  contrast,  the  performance  of  the  RLS  algorithm 
degrades  substantially  as  the  word-length  decreases.  The  performance  of  the  EWLS  algorithm  is  even 
worse.  The  RLS  and  EWLS  algorithms  overflowed  for  ibit  S  8.  The  OBE  algorithm  overflowed  for 
ibit  £  7. 

Example  2.  ARX(4,4)  process 

y(t)  ■  0.5y(t- 1  )-O.4y(t-2)+0.6y(t-3)+O.2y(t-4)  +  u(t)-0.29u(t-l)+0.5u(t-2)-0.7u(t-3)  +  v(t) 

The  input  and  noise  sequences  are  generated  by  a  pseudo-random  number  generator  as  before.  The 
average  tap  error  L,  for  the  OBE,  RLS  and  EWLS  algorithms  is  plotted  against  ibit  in  Fig.  2.  As 
before,  the  average  up  error  of  the  OBE  algorithm  appears  constant  as  ibit  varies  from  16  to  7  bits. 
The  RLS  and  EWLS  algorithms  do  not  work  well  for  ibit  £  8. 

Simulations  were  also  performed  for  an  ARX(10,I0)  model.  However  the  large  order  seems  to 
have  caused  greater  accumulation  of  round-off  errors  in  both  the  RLS  and  OBE  algorithms  and 
consequently  overflows  occurred. 

Example  3. 

The  performance  of  the  EOBE  algorithm  was  evaluated  by  simulating  an  ARMA(3,3)  process 
y(t)  -  -0.4  y(t-l)  +  0.2  y(t-2)  +  0.6  y(t-3)  +  w(t)  -  0.6  w(t-l)  +  0.2  w(t-2)  +  0.6  w(t-3) 

The  noise  sequence  (w(t))  is  generated  by  a  pseudo-random  number  generator  with  a  uniform 
probability  distribution  in  (-1.0,  1.0].  The  upper  bound  yJ  was  set  equal  to  25.0.  The  average 
parameter  error  L  is  plotted  for  the  EOBE  and  ELS(with  forgetting  factor  X  »1  and  X  =0.99) 
algorithms  in  Fig.  3.  As  in  the  previous  case,  it  can  be  seen  that  while  the  performance  of  the  EOBE 
algorithm  is  fairly  constant  over  a  range  of  word-lengths,  the  ELS  algorithm  does  not  perform 
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properly  for  ibit  <  9.  The  performance  of  the  ELS  algorithm  with  a  forgetting  factor  of  0.99  war 
worse.  The  algorithm  overflowed  for  ibit  <  13. 

Discussions.  The  superior  performance  of  the  OBE  (EOBE)  algorithms,  as  compared  to  the 
RLS  (ELS)  algorithms,  is  quite  encouraging.  One  of  the  reasons  could  be  the  selective  update  strategy 
of  the  OBE  algorithm.  Such  an  update  strategy  may  be  responsible  for  a  slower  accumulation  of 
roundoff  errors  on  account  of  the  updates  being  performed  infrequently.  Hence,  if  the  RLS(ELS)  and 
OBE(EOBE)  algorithms  operate  on  large  sets  of  data,  then  the  OBE(EOBE)algorithm  could  be  less 
prone  to  divergence,  simply  because  it  does  not  update  as  often. 

The  difference  in  performance  could  also  result  from  the  differences  in  the  update  equation  for  P(t). 
The  update  equadon  for  the  RLS  (ELS)algorithm  with  a  forgetting  factor  X  is 

P(t-l)Q(t)d>T(t)  j  P(t-l)  (6) 

X  +  0T(t)P(t-l)d>(t)  X 
The  corresponding  equadon  for  the  OBE(EOBE)  algorithm  is  (2e),  which  can  be  rewritten  as 

pm- [i — {7) 

l-X  t  l-X 

— U<p  (t)P(t-n<l>(t)  1 

Since  1-  X,  plays  the  same  role  in  the  OBE  algorithm  as  does  X  in  the  RLS  algorithm,  the  only 
difference  between  (6)  and  (7)  is  that  the  factor  (1-  X, )/  X,  appears  in  the  denominator  of  the  term 
within  braces  in  (7)  as  opposed  to  the  corresponding  term  X  in  (6).  The  degradation  of  performance 
occurs  primarily  because  the  term  within  braces  becomes  indefinite(has  positive  and  negative 
eigenvalues)  on  account  of  round-off  errors.  Since  X,  is  usually  much  smaller  than  unity,  the  term 
which  is  being  subtracted  from  the  identity  matrix  in  (7)  is  much  smaller  than  the  one  in  (6).  Thus  P(t) 
in  the  RLS(ELS)  algorithm  has  a  greater  tendency  to  become  indefinite  than  the  P(t)  in  the 
OBEfEOBE)  algorithm.  This  observation  has  been  confirmed  by  examining  me  eigenvalues  of  P(t), 
for  runs  in  which  the  RLS  algorithm  performed  poorly. 

The  failure  of  the  RLS  algorithm  when  the  order  is  large(>10)  is  well  known  and  there  exist  several 
methods  like  the  UDU'  [2.3]  and  QR  factorization  [4]  methods,  to  make  the  P  update  numerically 
stable.  For  the  OBE  algorithm,  a  recently  proposed  systolic  array  implementation  12]  may  have  better 
numerical  properties.  The  derivation  of  other  numerically  stable  recursions  for  the  OBE  algorithm  is 
currently  under  investigation. 

Conclusions.  The  finite-precision  performance  of  the  OBE  and  the  EOBE  algorithms  has  been 
studied  through  simulations.  The  performance  of  these  algorithms  in  small  word-length  environments 
is  superior  to  that  of  the  well  known  RLS  and  ELS  algorithms  .  The  improvement  is  attributed  to 


P(t)  *  [  I 
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differences  in  the  resursion  of  the  matrix  P(t)  and  less  number  of  updates  of  the  OBE  algorithm.  For 
large  order  processes,  both  the  RLS  and  the  OBE  algorithm  did  not  work  properly  when  the 
word-length  was  small  and  hence  more  numerically  robust  algorithms  may  be  required  for  such 
situations. 
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Abstract 

This  paper  investigates  a  t wn  laser  tirti/u  ml  uciinil  network  wliit  li  i  munis  of  one 
hidden  layer  of  second  order  neurons  and  on  output  layer  of  first  order  neurons.  The  second 
order  neurons  yield  conic-surface  type  of  decision  regions  such  as  ellipses  and  parabolas  A 
simulation  example  is  presented.  It  is  shown  that  such  networks  yield  more  flexibility  than 
first  order  ones.  Analysis  on  the  memory  capacities  of  single-layer  and  two-layer  networks  is 
presented. 


INTRODUCTION 

A  feedforward  neural  network  could  have  one  or  several 
hidden  layers  of  neurons  and  a  single  layer  of  output  neurons. 
Information  flows  upward  from  the  lowest  hidden  layer,  which 
receives  the  inputs,  to  the  output  layer  of  neurons  from  which  the 
outputs  of  the  neural  network  are  retrieved.  Typically,  the  output  of 
each  neuron  in  the  network  is  a  semi-linear  function  of  the  inputs 
formulated  by 


O, 


HI' 


cOj.O, 


+e, 


(i) 


where  Oj  denotes  the  output  of  the  j-th  neuron,  coj ^  is  the  weight  for 
the  connection  from  the  i-th  to  the  j-th  neuron,  and  Oj'  is  the  i-th  input 

to  the  j-th  neuron.  The  function  f(.)  in  (I)  is  typified  by  the  sigmoid 
function 

f(x)  = - !—  (2) 


0+0 

The  sum  Ij<OjjOj+0j  is  often  referred  to  as  the  discriminant  function 

The  concept  of  first  and  second  order  discriminant  functions 
arise  quite  naturally  in  classification  problems  of  statistical  pattern 
recognition  (2).  Assume  that  it  is  desired  to  classify  a  d-dimensionat 
input  pattern  x  as  either  of  class  A0  or  of  class  A  | ,  given  that  the 
conditional  probability  density  functions  p(xlAg),  p(xlAj)  are 

Gaussian  with  mean  vectors  pQ  and  pj  and  covariance  matrices  R() 
and  R[,  respectively. 

This  problem  can  be  formulated  as  one  of  a  simple  hypothesis 
testing  Using  the  Bayes  or  the  Neyman-Pearson  test,  we  obtain  the 
following  second  order  discriminant  function, 

T(x)=(p[Rj,-pSRo')x+ixT(R0-1 


-R[')x  >A|x 


(3) 


where  T  is  some  appropriate  threshold.  If  the  distributions  of  the 
classes  Aq  and  A|  have  the  same  covariance  matrices,  (i  e.,  Rq=R  |  >. 
then  the  discriminant  function  in  (3)  defines  a  hyperplane  in 
d-dimensions.  When  the  covariances  are  not  equal,  fie 
then  the  discriminant  function  is  of  second  order. 


R0*R|>. 


SECOND  ORDER  NETWORKS 

High  order  generalizations  of  (1)  can  be  obtained  bv 
considering  high  order  discriminant  functions  as  follows,  [  3] 

x=Xc‘)J-Oi+X0)Jik°i'Ok'+  +6)  (4) 

i  i.k 

where  x  is  the  argument  in  the  sigmoid  function,  (2).  We  can  see  that 
there  are  more  degrees  of  freedom  in  a  higher  order  neuron  and  this 
will  increase  the  capabilities  of  the  neuron,  at  the  expense  of 
increasing  the  number  of  weights  needed  per  neuron 

The  second  order  discriminant  function  for  a  second  order 
neuron  is  obtained  from  (4)  by  excluding  all  terms  higher  than  second 
order,  hence 


(5) 


i.k 


Assuming  that  there  are  d  inputs  to  the  neuron,  then  there  are 
fd+l)(d+2V2  weights  per  neuron.  This  is  considerably  more  than  the 
conventional  (first  order)  neuron.  However,  second  order 
discriminant  functions  can  define  conic-surface  tvpes  of  decision 


regions  such  as  ellipses,  hyperbolas  and  hypcrplanes,  inslead  of  just 
the  hvperplane  decision  regions  available  with  the  first  order  neuron. 
An  advantage  here  is  that,  with  such  conic-surface  type  of  decision 
regions,  one  can  obtain  smoother  regions  with  less  number  of  hidden 

neurons. 

Simulation  Example 

The  characters  A',  'C,  E',  and  O'  are  represented  by  a 
3-by-3  matrix  of  one  bit  pixels  and  each  character  is  presented  to  a 
single- layer  feedforward  network  as  a  9-by-l  vector  (or  pattern)  The 
single  layer  network  is  composed  of  two  output  neurons  The 
network  is  then  trained  to  recognize  a  character  by  giving  a  two-bit 
decision  using  the  two  neurons.  Figure  1  shows  the  estimated  root 
mean  square  error  of  the  first  output  neuron  versus  the  number  of 
iterations  (number  of  presentations  of  the  character-set).  Figure  2 
shows  the  same  plot  for  the  second  output  neuron.  Both  figures 
demonstrate  that  second  order  neurons  can  be  trained  faster  than  first 
order  neurons. 


o  2rid  order 
I  st  order 


Figure  I :  RMSE  of  first  neuron  vs.  number  of  ueranons 


Figure  2:  RMSE  of  second  neuron  vs  number  of  iterations 

To  achieve  a  more  general  decision  region,  we  sould 
approximate  a  given  decision  region  as  j  collection  ol  ellipses  and 
take  the  union  of  these  elliptical  decision  recions  Each  elliptical 
decision  region  can  be  formed  with  a  single  second  order  neuron  and 
the  final  decision  region  is  found  bv  inline  the  union  i.t  log’s. il  <  >R 
operation)  of  the  elliptical  decision  regions  o|  ail  i he  second  order 
neurons 
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The  above  can  be  achieved  with  a  two-iayer  feedforward  neural 
network  composed  of  first  order  output  neurons  and  a  single  layer  of 
hidden  second  order  neurons,  as  shown  in  Figure  3. 


Figure  3:  Proposed  feedforward  network 
Simulation  examples  using  the  proposed  network  can  be  found  in  (4|. 

EXTENSIONS  OF  BACKPROPAGATION  ALGORITHM 
The  backpropagation  algorithm  [5],  is  easily  extended  to  the 
proposed  two-layer  network.  Given  the  inputs  and  the  desired 
outputs,  (X,u),(  Vju)  respectively,  the  objective  here  is  to  minimize 
the  following  cost  function 

e=iX^-°i)2  (6) 

i 

where  u  denotes  the  u-th  pattern  of  input/output  vectors  to  be  learned, 
and  Oj  is  the  actual  output  of  the  i-th  output  neuron.  Using  the 

negative  gradient  of  £  as  a  descent  direction  we  obtain 


(i)  For  weights  associated  with  the  output  neurons 

8j  =  Oj(l-Oj)(V“-Op  (7a) 

ojj.fk+l)  =  Wjjfkl+TiO.'S,  (7b) 

(ii)  For  weights  associated  with  the  hidden  neurons 

5j  -0/I-0j)]£“i<A  (8a) 

k 

Here,  the  index  k  refers  to  the  output  neurons. 

o)jj(k+l)  =  CDjifkj-HiOj'Sj  (8b) 

0)jim(k+ 1 )  =  Wjim(IO+TlOi'Om  6;  (8c) 

MEMORY  CAPACITY  OF  NETWORKS 


We  next  examine  the  problem  of  how  much  information  can  be 
stored  within  a  feedforward  neural  network  . 

To  begin  with,  define  fundamental  memories  as  the 
input-patterns  to  be  implemented  and  associated  memories  as  the 
corresponding  output-patterns. 

Definition  The  memory  capacity  of  a  given  neural  network  is  the 
maximum  value  of  K,  such  that  the  neural  network  can  map  any 
K-set  of  fundamental  memories  to  any  K-set  of  associated  memories 
We  make  the  additional  assumption  that  the  K-set  of 
fundamental  memories  are  in  general  position,  i.e.,  no  (d+1)  of  the 
fundamental  memories  on  the  same  (d-l)-dimensional  hyperplane. 

Then,  we  examine  the  memory  capacity  of  a  single-layer 
feedforward  neural  network  with  N  output  neurons  and  d  inputs. 

Theorem  1 

The  memory  capacity  of  the  single  - layer  feedforward  neural 
network  is  K  such  that 
K.  ■  d+1. 

Remarks 

(1)  The  proof  follows  from  our  definition  of  memory  capacity 
and  by  looking  at  a  single  neuron  in  the  network. 

(2)  In  die  case  where  the  K-set  of  fundamental  memories  are 
not  in  general  position,  the  capacity  K  is  upper  bounded  by  d+1, 
(i.e„KSd+l). 


We  next  look  at  the  memory  capacity  of  the  Madaline  network 
[6],  which  is  a  two-layer  feedforward  neural  network  composed  of  a 
single  output  neuron,  m  hidden  neurons  and  d  inputs. 

Theorem  2 

The  memory  capacity  of  the  Madaline  network  is  K  such  that 
K  -  md. 

Remarks 

(I)  The  proof  uses  I  cninn  I  ami  I  lirnn-m  I  in  Baiun  1 1 1 
tJ)  I  lie  .ilk p vc  result  agrees  with  (lie  intuitive  notion  tli.it 
increasing  the  number  of  hidden  neurons  increases  the  computational 
ability  of  the  neural  network. 

The  memory  capacity  of  a  two-layer  feedforward  neural 
network  composed  of  N  output  neurons,  m  hidden  neurons  and  d 
inputs  is  stated  in  liie  next  theorem. 

Theorem  3 

The  memory  capacity  of  the  two-layer  feedforward  neural 
network  is  K  such  that 
md/N  S  K  S  md. 

Remarks 

( 1 )  The  proof  uses  Theorem  2. 

(2)  The  second  order  discriminant  function  can  be  viewed  as  a 
mapping  from  d-space  into  (d+l)(d+2)/2  -space,  which  increases  the 
dimensionality  of  the  inputs.  Hence,  making  the  stronger  assumption 
that  the  new’  fundamental  memories  that  are  mapped  from  d  -space 

are  in  general  position,  we  apply  the  previous  theorem  to  the 
proposed  network  and  obtained, 

m(d+ 1  )(d+2)/2N<K<m(d+l  )(d+2)/2 

Thus,  using  second  order  discriminant  functions  increases  the 
memory  capacity  of  the  two  layer  feedforward  neural  network  This 
is  achieved,  however,  at  the  cost  of  increasing  the  number  of  synaptic 
weights  needed  per  neuron. 

CONCLUSIONS 

A  two-layer  feedforward  neural  network  is  presented  which 
consists  of  first  order  output  neurons  and  second  order  hidden 
neurons.  Analysis  on  the  memory  capacities  of  the  single-layer  and 
two-layer  feedforward  neural  networks  was  presented  and  applied  to 
the  proposed  network. 
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Department  of  Electrical  &  Computer  Engineering 
University  of  Notre  Dame 
Notre  Dame,  IN  46356 


ABSTRACT 

Statistical  analysis  of  a  recursive  parameter  estimation 
algorithm  is  performed.  The  algorithm  has  been  used  for  the 
estimation  of  parameters  of  ARX  processes  with  bounded 
noise.  Previous  analyses  of  algorithms  used  in  the  bounded 
noise  situation  have  been  essentially  deterministic. 
Unbiasedness  of  the  estimates  is  shown  under  the  assumption 
that  the  noise  is  white  and  zero  mean.  An  upper  bound  on  the 
covariance  of  parameter  estimates  is  derived.  An  improved 
version  of  the  algorithm  is  proposed  which  yields  smoother 
estimates  and  greaser  resistance  to  outliers. 


INTRODUCTION 

The  optimal  bounding-ellipsoid  (OBE)  algorithm  (1-3],  is 
a  recursive  parameter  estimation  algorithm  which  incorporates 
information-dependent  updating.  It  has  been  used  for 
estimating  the  parameters  of  autoregressive  processes  with 
auxiliary  inputs  and  bounded  disturbances  (ARX  processes). 
The  key  feature  of  the  OBE  algorithm  is  that  at  every  time  step, 
a  decision  is  made  whether  or  not  to  update  the  parameter 
estimates.  Simulation  results  have  shown  that  the  algorithm, 
on  the  average,  updates  only  20%  of  the  time.  This  fact  makes 
it  eminently  suitable  for  processing  multiple  channels 
simultaneously  is  it  makes  a  time-sharing  type  of  signal 
processor  feasible.  Another  advantageous  feature  of  the  OBE 
algorithm  is  that  after  the  algorithm  converges  the  parameter 
updating  stops.  This  is  particularly  useful  in  adaptive  control 
applications  where  non-cessation  of  parameter  updating  may 
lead  to  instability.  The  OBE  algorithm  has  been  used  for 
adaptive  signal  processing  applications  such  as  spectral 
estimation  and  adaptive  hybrid  balancing  [4],  In  this  paper, 
some  properties  of  the  OBE  estimator  will  be  investigated. 
The  analysis  of  bounded  error  algorithms  [1,3,6]  has  been 
essentially  deterministic.  The  statistical  properties  presented 
here  provide  a  better  understanding  of  the  algorithm  and  also 
facilitate  the  performance  analysis  of  the  algorithm.  In 
addition,  a  modification  of  the  algorithm  which  improves  its 
performance  will  be  proposed. 

THE  OBE  ALGORITHM 

Consider  the  ARX  model  formulated  by 

y(0  -  »iy(t-l)  +  ajyd-2)  +..+  a^yd-n)  +  t^ud)  + 
bju(t-l)  +..+  bmu(t-m)  ♦  v(t)  (1) 

where  y(t)  is  the  measurable  output,  u(t)  is  the  measurable 
input  and  v(t)  represents  the  unknown  but  bounded 
disturbance  at  time  instant  t .  The  ARX  equation  can  be  recast 
as 


y(t) »  $Td)  0*  ♦  v(t)  (2) 

where 

0*(t)  -  (a,(t) ,  aj  (t) . *n  (0  .bo  (0.  bi  d), ...  bm  (t)]  T 

is  the  vector  of  system  parameters  and 

<b(t)  ■  [  y(t-l),  y(t-2),  ...y(t-n),  u(t).  u(t-l),  ...u(t-m)]  T 
is  the  regressor  vector.  Given  the  input  and  output  sequences 
and  an  upper  bound  y  on  the  magnitude  of  the  noise,  the  OBE 


algorithm  can  be  used  to  estimate  the  parameters  a,  ....  a,  . 
bQ,.. ,!)„  .The  key  equations  defining  the  OBE  algorithm  of  [3] 
ate 

P  l(t)  - 

(1-X,)  P4(t-1)  +  Xt<&d)*T(t) 

(3a) 

8(t)  - 

9(1-1)  ♦  X,  PdWU  «<t) 

(3b) 

5<t)  - 

yd)  -  eT(t-i)®d) 

(3c) 

o\t)  - 

\  (1-Xp  8J(t) 

(3d) 

R‘(0)  > 

I  and  0  (0)  »  0,  <^(0)  »  I/e ,  e  «  I 

(3e) 

where  A 

0(t)  -  (a,(t>.  a j  (t) ...  £„  (t) .  ^  (t),  b,  (t) . ...  bm  (t)  ]  T 
is  the  vector  of  parameter  estimates  and 
G(t)  -  ®T(t)P(t-l)Od) 

In  this  paper,  the  information  dependent  updating  factor  X,  is 
determined  by  minimization  of  the  non-negative  quantity  otit) 
at  every  time  instant .  The  optimal  value  of  X,  could  be  zero, 
thus  implying  that  the  data  point  (y(t),  ud)  )  contains 
insufficient  information,  and  consequently  the  estimates  would 
not  be  updated.  Further  details  and  applications  of  the 

algorithm  can  be  found  in  [3,7]. 

UNBIASEDNESS  OF  THE  ESTIMATES 

The  following  analysis  assumes  that  the  noise  sequence 
[v(t))  is  white  and  is  zero  mean.  From  (3a)  and  (3e)  it  is 
easily  shown  that 

Pd) » i  ri ( 1  ■  v 1 1  +  £  c*«<i>(i)  4,1(0  t41 
>•1  *■ 1 

where 

l5) 

*«i*t 
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Substituting  (3s)  and  (3c)  in  (3b)  yields 
p‘(t)0(t)  -  (l-\)PJ(t-l)e(M)+  X,«(t)y(t)  (6) 

Hence 

Pl(t)0(t) -^q^i)  y(i)  +  (1-  X1)..(l-Xl)P',(0)6(0)  (7) 

>»t 

Since  the  inidtl  value  0(0)  ■  0  the  second  term  in  (7)  vanishes. 
Substituting  for  y(i)  from  (2)  yields 

8(0  *  P(t)£  q, ,  <*i)  «*»T(i)  0*  +  P(t)  £  q„®(i)  v(i)  (8) 

iat  ial 

Using  (4)  in  (8)  yields 


0(t)*0  -P(t) 


IIu  -  L )  0*  +  P0)£qt®(i)v<i) 


Since  0  £  Xj  <  1 .  it  is  generally  true  that  for  Urge  t. 

n*  .  i  (1-  Xj  )  is  very  small. Thus  the  second  term  in  the 
right  hand  side  of  (9)  can  be  neglected.  Consider  the 
subsequences  of  q^  ,<P(i)  and  v(i)  obtained  by  considering 

only  those  instants  for  which  X,t  and  hence  q*  is  non-zero  The 
third  term  in  (9)  can  be  replaced  by  a  summation  over  these 
subsequences  of  length  t'  S  t .  Taking  expectations  on  both 
sides  yields 


tin  E(0(t)}=  0* 


+  E{  P(t)  £ 


q«®(i)v(0} 


It  has  been  shown  [3]  that  if  the  input  U  "persistently  exciting” 
then  X,  tends  to  0  and  P(t)  is  bounded  asymptotically  .  It 
follows  then  that  P(t)  tends  to  a  constant,  say  P 
asymptotically.  To  simplify  the  above  equation  it  is  further 

assumed  here  that  v(i)  is  unconeUted  with  <b(i)  and  q^  .  for 
time  instants  when  X,  is  non-zero.  Simulation  studies 
confirmed  that  the  correlation  coefficient  between  q^  and  v(t) 
is  very  close  to  zero.  A  possible  explanation  far  this  is  that  q* 
which  U  actually  the  product  of  X, ,  (1  -  XM  ), ...  (  1  -  X,. )  is 

dependent  on  the  data  set  (y,  ...  y,  )  as  X,  depends  on  y(i), 

y(i-l) . y(l).  A  change  in  v(i)  will  not  really  affect  the 

calculated  value  of  %•,  as  long  as  t  and  t'  are  Urge  enough. 
In  addition,  4X0  and  v(i)  are  unconeUted,  as  (v(i)|  is  a  white 
noise  sequence  Therefore 

E  (  q* 4X0  v(i)  )  -  E(q*<t*i)|E(  v(i)) 


From  (10)  it  follows  that 


ta  E  { 0(t)  } 


i’fE{p£ 


q„*(i)  }  E  { v(i)  } 


“d  since  (v( )  is  zero  mean 

Km  E  {  0  (t) )  -  0* 

— 

Thus  the  estimator  is  shown  to  be  asymptotically  unbiased. 


ASYMPTOTIC  COVARIANCE 

The  asymptotic  covariance  of  the  estimates  is  a  good 
measure  of  the  steady  state  performance  of  recursive 
estimation  algorithms.  Finding  an  exact  expression  for  the 
covariance  of  the  OBE  estimates  is  a  formidable  task  because 
of  the  presence  of  the  factor  q*  in  (9) .  This  factor  is  related  to 
the  data  in  a  highly  non-linear  fashion.  However,  after  making 
a  few  simplifying  assumptions,  an  upper  bound  for  the 
covariance  can  be  obtained 

With  the  same  assumptions  as  before  (9)  can  be  reduced  tc 
0(0  -0*  =  P(t)  v(i)  (11) 

Let  l'(t)  -  0  (t)  -  0*.  then  since  P(t)  -  PT(t) 

0(t)«  T(t)»  P(0  itS  q*q®(i)®T(j)v(i)vO)]  P(t) 

i-i  j-i 

-P(t)  &£  q^dXO  ®T(j)v(i)  v(j)]  P(t) 

i«l  j*i*  1 

+  PCO  £  t£«(i)*T(i)v\i)  P(t) 

i-1 

♦  P(t)  Its  qj<®(i)  ®T(j)v(i)  v(j)  P(t)  (12) 

j«l  i  *j*l 

It  is  assumed  here,  aa  before,  that  for  Urge  t  the  factor  qlt .  at 
the  updating  instants,  U  unconeUted  with  vjj)  for  all  i  and  j, 
and  that  P(t)  tends  to  a  constant  P  asymptotically.  Also  since 

( v(i) )  is  while,  <b(i)  U  unconeUted  with  v(j)  for  all  j  2  i  This 
would  imply  that  in  (12),  the  term  (q*  qj,®(i)  <PT(j)  v(i) )  is 
unconeUted  with  v(j)  for  all  j  >  i  and  the  term  (q^  qJt  <b(i) 

®T(j)  v(j) )  is  unconeUted  with  v(i)  for  all  i  2  j.  Taking 
expectations  on  both  sides  of  the  above  expression  yields 

lim_E{0(t)0T(t)}» 

Um  E{  Si  Pq,q,<XKi)  <t>(i)Tv(i)P}  E{  v(j) ) 

1_*“ 

+  E{Pq2<Xi)  <PT(i)P}oi 

i-1 

+  £  { *» qk q, •*»<>)  <»T0)v(j)}  E{v(i) )  (13) 


j.i  i-j* i 

where  0,2  is  the  variance  of  the  white  noise  sequence(v(i)). 
Since  (v(i) ) is  zero  mean,  the  first  and  last  terms  of  (13) 
vanish  and  thus 

lim  E  {  0(t)0'  T(t)  }  -  oj  Xe{P  q.qt<b(i)d>T(i)  P  ) 

'-***  i-i 

Since  q^  <b(i)  ®T(i)  is  positive  semi-definite  and  since 
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0  £  qj,  S  o,  where  a  i>  a  user  chosen  upper  bound  on  X,  it 
follows  that 


From  (4),  for  sufficiently  large  t, 

q^t*!)  <t>T(i)  -Pl(t) 

i»l 

and  since  P(t)  -*  P,  this  would  imply  that  P-1  (t)  -»  P-* . 
Hence  for  sufficiently  large  t 

e { e<t) e T(o }  s  aujEiP]  (i4) 

or 

E  { tr  1 8(t)  9  T(t)  1  }  £  a  oj  E  (tr  {  P  ]  )  (15) 

If  the  input  sequence  is  persistently  exciting,  it  can  be  shown 
[3]  that  there  exists  a  constant  4  such  that  P  S  4  I .  Therefore 
from  (14) 

lim  E  {  8<t)  ©  T(t)  }  S  a  a2  4 1 

1  -*  -  * 

It  must  be  emphasized  that  this  upper  bound  is  rather 
conservative,  because  the  maximum  value  of  q^  will  actually 

be  much  lower  than  a. 

IMPROVED  ALGORITHM 

In  some  applications,the  parameter  estimates  obtained  by 
applying  the  OBE  algorithm  fluctuate  excessively.  This 
naturally  causes  the  algorithm  to  have  a  larger  steady  state 
error  and  parameter  covariance  as  compared  to  the  recursive 
least-squares  algorithm.  These  fluctuations  are  highly 
undesirable.  A  possible  explanation  for  this  unwanted 
property  is  presented  below. 

The  OBE  algorithm  of  [3]  has  the  property  that  the 
quantity  o2(t),  which  is  related  to  the  size  of  the  optimal 
bounding  ellipsoid  E, ,  is  monotone  non-increasing  .  When 
the  size  of  the  ellipsoid  E,  does  decrease,  thus  decreasing  the 
size  of  the  region  in  which  the  estimates  are  constrained  to  lie, 
the  parameter  estimates  themselves  which  correspond  to  the 
center  of  E,  may  shift  considerably.  This  shifting  of  the  center 
of  the  optimal  bounding  ellipsoid  accounts  for  the  fluctuation 
of  parameter  estimates. 

For  the  OBE  algorithm,  the  parameter  estimate  update 
equation  is 

9<t)  m  Oft- 1)  +  X,  P(t)<*t)  8<t) 

Thus  the  amount  of  variations  of  the  estimates  is  directly 
proportional  10  Hence  one  way  to  reduce  the  fluctuation  is 

to  reduce  the  value  of  Xr  However  if  X,  is  reduced  by  the 
same  amount  for  all  t  ,  then  the  tracking  ability  of  the 
algorithm  deteriorates  and  the  bounding  ellipsoid  at  every 

iterance  t ,  will  no  longer  be  optimal  in  the  sense  that  a2  (t)  is 
minimum. Therefore  a  compromise  value  of  X,  is  required 
which  does  reduce  the  amount  of  fluctuation  and  also  retains 
the  tracking  property.  A  possible  solution  is  to  multiply  the 


value  of  X  calculated  at  each  iteration  by  a  scaling  factor  M(t), 
which  is  small  for  isolated  updates  Out  close  to  unity  for  a 
series  of  successive  updates.  A  possible  choice  for  the  scaling 
factor  is 

M(t)  ■  exp(  -$  /  nup(t) )  (17) 

with 

s  a  user  chosen  constant  in  the  range  0  <  s  <  10.  and 
nup(t)  *  0  if  X,  *  0 

otherwise 

nup(t)  »  nup(t-l)  +  1 

If  there  is  a  string  of  updates,  as  would  be  the  case  if  die 
algorithm  is  trying  to  track  the  parameters,  then  only  the  first 

X  in  the  string  would  be  decreased  significantly  and  the  rest 
would  be  decreased  marginally.  Thus  isolated  updates  which 
may  be  due  to  noisy  observations  are  reduced  significantly 
thereby  reducing  the  "random”  fluctuation  in  parameter 
estimates.  The  modified  algorithm  thus  has  greater  resistance 
to  outliers.  A  similar  scaling  factor  has  been  used  to  reduce  the 
steady  state  mean-square  prediction  error  in  the  exponentially 
weighted  least-squares  algorithm  [5], 

It  must  be  emphasized  that  the  choice  of  the  scaling  factor 
used  here  is  somewhat  arbitrary.  A  scaling  factor  is  required 
which  wiil  be  small  for  nupft)  S  2  and  will  be  close  to  unity 
for  larger  values  of  nup(t).  The  scaling  factor  used  here 
satisfies  these  requirements.  Simulations  have  been  performed 
to  test  the  effectiveness  of  the  scaling  factor  and  the  results  are 
presented  in  the  next  section. 

SIMULATION  RESULTS 

In  order  to  verify  the  above  analysis,  Monte  Carlo 
simulation*  were  performed.  The  OBE  algorithm  of  [3]  was 
applied  to  100  different  data  sets  generated  by  the  foUowin* 
AR  (2)  model  * 

y(t)  -  -0.4  y(t-l)  -  0.85  y(»-2)  v(t) 

where  (v(t)|  is  an  i.i.d.  sequence  having  a  uniform 
distribution  in  [  -1.0,  1.0  ].  Each  data  set  contains  1000  data 
points.  For  each  dam  set,  the  estimate  vector  obtained  at  each 
iteration  was  subtracted  from  the  true  parameter  vector.  The 
sample  bias  defined  by 

T5J 1 1  •(.)-»•! 

was  computed  for  t «  1  to  1000.  In  Figure  1,  this  average  is 
plotted  against  the  iteration  index  t  As  the  number  of 
iterations  increases,  the  bias  becomes  closer  to  0.  thus 
experimentally  verifying  the  asymptotic  unbiasedness  of  the 
estimator. 

The  sample  variance  for  each  component  of  the  estimate 
vector  defined  as 

T3o|!(9.  <!•'■>  -  0-0)  ]2 

where  8t(j,:)  is  the  j  'th  component  of  the  estimate  of  the  Hh 
data  set  at  time  instant  t,  is  calculated  for  t  going  from  1  to 
1000.  The  sample  variances  for  each  component  are  sdiW 
and  plotted  in  Fig.  2  against  the  iteration  index  t  .  The 
asymptotic  value  . 
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P  ”  100  ^  PioooW 

where  P1000  (i)  is  the  the  matrix  P(1000)  obtained  from  the 
i'th  data  set  The  upper  bound  is  computed  using  (15)  with 
a  »  0.5  and  0,3  -  0.33  .  The  computed  upper  bound  is 
rqual  to  0.84. 

The  OBE  algorithm  of  (3],  with  and  without  the  scaling 
factor,  was  applied  to  data  generated  by  the  above  AR  (2) 
model.  A  scaling  factor  with  s  »  2,  (c.f.  (17) ),  was  found  to 
yield  good  results.  The  normal  and  smoothed  parameter 
estimates  are  plotted  against  the  iteration  index  t  in  Figure  3  . 

It  is  clear  from  the  plot  that  using  a  scaling  factor  on  X. 

reduces  me  amount  ot  fluctuation  and  yields  smootner 
parameter  estimates. 


CONCLUSION 

We  have  performed  an  analysis  of  certain  statistical 
properties  of  a  new  recursive  estimation  algorithm.  The 
estimator  is  shown  to  be  asymptotically  unbiased  and  an  upper 
bound  on  the  asymptotic  covariance  of  parameter  estimates  has 
been  derived  with  some  reasonable  assumptions.  The  analysis 
has  been  verified  through  Monte  Carlo  simulations  of  a 
particular  AR(2)  model.  Finally  an  improved  algorithm  has 
been  proposed  and  shown  to  yield  smoother  estimates. 
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ABSTRACT 

Recently,  there  appears  to  be  a  resurgence  of  interest  in  the  area  of  membership- set 
parameter  estimation.  One  such  algorithm,  the  so-called  Optimal  Bounding  Ellipsoid  (OBE) 
algorithm,  proves  to  be  appealing  in  theory  and  practice.  The  algorithm,  which  features  a 
discerning  update  strategy,  was  first  derived  for  the  recursive  estimation  of  parameters  of 
autoregressive  with  auxiliary  input  (ARX)  models.  This  paper  investigates  an  extension  of  the 
algorithm  to  ARMA  parameter  estimation.  The  convergence  analysis  is  complicated  due  to  the 
discerning  update  strategy  which  incorporates  an  information  dependent  forgetting  factor.  It  is 
shown  that  if  the  input  noise  is  bounded  and  if  the  process  is  stable,  then  the  a  posteriori 
prediction  error  is  bounded  even  without  the  SPR  condition.  This  is  in  sharp  contrast  to  the 
crucial  role  of  the  SPR  condition  in  the  ELS  and  output  error  algorithms. 

L  INTRODUCTION 

The  autoregressive  moving  average(ARMA)  parameter  estimation  problem  arises  in  many 
adaptive  signal  processing  applications  such  as  speech  processing,  seismic  data  processing  and 
channel  equalization.  Typically,  samples  of  the  measured  signal  y(t)  are  modeled  as  the  output 
of  an  IIR  filter  driven  by  unknown  white  noise  w(t)  [  1  ].  The  ARMA  model  is  described  by  the 
temporal  recursion 

y(t)  =  a[  y(t-l)+..+  an  y(t-n)  +  w(t)  +  c,  w(t- 1)  +..+•  cr  w(t-r)  (1) 

Fitting  this  ARMA  model  to  the  measured  data  y(t),  t  =1..T  ,  requires  the  estimation  of  the 
parameters  at  ...  an  ,  c,..,  cr .  Recursive  schemes  like  the  extended  least-squares  (ELS),  the 
recursive  maximum  likelihood  (RML)  and  multi-stage  least-squares  algorithms  have  been  used 
to  estimate  ARMA  parameters  [2,3].  The  ELS  algorithm  uses  the  a  posteriori  prediction  error 
e(t),  as  an  estimate  of  w(t).  The  regressor  vector  is  formed  from  y(t- 1 ),..,  y(t-n)  and  e(t-l),.., 
e(t-r).  The  standard  recursive  least-squares  (RLS)  algorithm  is  then  employed  to  update  the 
estimates.  The  algorithm  is  conceptually  simple  but  restrictive  in  the  sense  that  convergence  of 
the  algorithm  can  be  assured  only  if  the  underlying  transfer  function  H(q_1)  =  1/  C(q'~)  -  1/2 
is  strictly  positive  real  (SPR),  with  q"^  being  the  delay  operator  and 

C(q'‘)  =  1  +  c,  q'1  +  c2  q-:  +  ..  +  cr  q  r 

The  RML  algorithm,  which  uses  a  filtered  version  of  the  regressor  vector  used  in  the  ELS 
algorithm,  does  not  require  H(q'b  to  be  SPR.  However  the  estimates  have  to  be  monitored 
and  projected  into  a  stability  region  to  ensure  convergence^]. 

In  addition  to  the  aforementioned  least-squares  based  methods,  there  exists  a  different  class 
of  estimation  algorithms  that  estimate  membership  sets  of  parameters  which  are  consistent  with 
the  measurements,  model  structure  and  noise  constraints  [4]-{8].  These  algorithms  are 
particularly  useful  when  the  noise  distribution  is  unknown  but  constraints  in  the  form  of 
bounds  on  the  instantaneous  values  of  the  noise  are  available.  The  model  and  noise 
representations  commonly  used  are 

y(t)  =  0*  T  <I>(t)  +  v(t),  lv(t)l  <  r1/2(t) 

where  (y )  is  a  sequence  of  scalar  observations,  9*  is  the  parameter  vector  to  be  identified,  <l>(t) 
is  a  n-vector  of  variables  known  at  time  t,  (v|  is  the  noise  sequence  and  ±  rl/2(t)  are  the  time 
varying  noise  bounds.  Given  a  sequence  { y(i),<t>(i) } ,  i=l  ..k,  the  opdmal  membership  set 
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Sj  =  (  0  :  (  y(i)  -  0T  0(i)  )2  <  r(i),  0  e  Rn  ) 

From  a  geometrical  point  of  view  Sj  is  a  convex  polytope  in  Rn  and  contains  the  true  parameter 
vector.  Finding  is  computationally  intractable  and  it  is  therefore  necessary  to  approximate 
yO  by  some  set  which  encloses  it  tightly  and  which  can  be  described  and  updated 
economically  [9].  The  different  membership-set  algorithms  differ  in  the  way  the  optimal 
membership  set  is  approximated  and  in  the  method  used  to  obtain  a  tight  enclosure. 

Among  these  algorithms  based  on  membership  sets,  a  seminal  recursive  algorithm  is  the 
so-called  optimal  bounding  ellipsoid  (OBE)  algorithm[6-8].  One  of  the  main  features  of  this 
algorithm  is  a  discerning  update  strategy.  This  feature,  obtained  by  the  introduction  of  an 
information  dependent  updating/forgetting  factor,  yields  a  modular  structure  thereby  increasing 
the  potential  for  concurrent  and  pipelined  processing  of  signals.  The  presence  of  such  a 
forgetting  factor  also  gives  the  algorithm  the  ability  to  track  time  varying  parameters.  The 
algorithm  has  the  advantageous  feature  of  automatic  asymptotic  cessation  of  updates  in  the 
fixed  parameter  case. 

In  this  paper,  we  extend  one  of  these  OBE  algorithms[8]  to  the  ARMA  case.  For  the 
ARMA  parameter  estimation  problem,  the  OBE  algorithm  cannot  be  applied  in  its  present  form. 
However,  by  assuming  that  the  input  white  noise  is  bounded  in  magnitude,  the  OBE  algorithm 
can  be  extended  in  a  manner  similar  to  the  ELS  algorithm.  The  convergence  analysis  of  the 
resulting  algorithm,  as  opposed  to  that  of  the  ELS  algorithm,  is  deterministic  and  is  performed 
under  the  assumptions  that  the  process  is  stable  and  that  the  noise  is  bounded.Th e  a  posteriori 
prediction  error  is  shown  to  be  bounded  without  imposing  any  SPR  condition.  This  is  in 
contrast  to  the  convergence  analysis  of  the  ELS  or  output  error  algorithms  in  which  the  SPR 
condition  is  used  to  prove  boundedness  of  the  prediction  errors  and  convergence  of  parameter 
estimates]  10].  By  imposing  a  persistence  of  excitation  condition  on  the  regressor  vector,  the  a 
priori  prediction  error  of  the  extended  OBE  algorithm  is  shown  to  be  bounded  and  the 
parameter  estimates  are  shown  to  converge  to  a  neighborhood  of  the  true  parameter  vector. 

The  paper  is  organized  in  the  following  manner.  In  Section  II.  a  brief  review  of  the  OBE 
algorithm  and  its  properties  is  presented.  In  Section  III.  the  algorithm  is  extended  to  ARMA 
parameter  estimation.  Convergence  analysis  of  the  extended  algorithm  is  performed  in  Section 
IV.  The  performance  of  the  algorithm  is  compared  with  the  ELS  algorithm  through  simulation 
studies  in  Section  V.  Section  VI  concludes  the  paper. 


II.  THE  OBE  ALGORITHM 

Consider  the  ARX  model  described  by 

y(t)  =  at  y(t-l)+..+  any(t-n)  +  b0u(t)  +  bj  u(t-l)  +..+  bmu(t-m)  +  v(t) 

The  above  equadon  can  be  recast  as  . 

y(t)  =  0*  T <J>(t)  +  v(t)  -i 

where 

0  =  [a(  ,  a-,  ...  an  ,  b0  ,  bj . bm  ]T 

is  the  vector  of  true  parameters  and 

<t»(t)  =  [  y(t-l),  y(t-2), ..  y(t-n),  u(t),  u(t-l),  ..  u(t-m)  ]  T 
is  the  regressor  vector.  It  is  assumed  that  the  noise  is  uniformly  bounded  in  magnitude,  i.e  , 
there  exists  y0  >  0  ,  such  that 

v  2(t)  <,  y02  for  all  t,  hence 

(y(t)  -  0^0(0  )2  <  Y02 

Let  St  be  a  subset  of  the  euclidean  space  R  n  *  m  ,  defined  by 
St=  (0:  (  y(t)  -  0T  O(t))2  <  Y02  ,0eRn*m  +  1  ) 
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The  OBE  algorithm  starts  off  with  a  large  ellipsoid,  E0  ,  in  Rn+tn^1  which  contains  all  possible 
values  of  the  modelled  parameter  0*.  After  the  first  observarion  y(l)  is  acquired,  an  ellipsoid 
is  found  which  bounds  the  intersection  of  E0and  the  convex  polytope  Sj.This  ellipsoid  must 
be  optimal  in  some  sense,  say  minimum  volume[6-7j  or  by  any  other  criterion[6-8],  to  hasten 
convergence.  Denoting  the  optimal  ellipsoid  by  E,  one  can  proceed  Exactly  as  before  with 
the  future  observations  and  obtain  a  sequence  of  optimal  bounding  ellipsoids  {  Et ).  The 
center  of  the  ellipsoid  Etcan  be  taken  as  the  parameter  estimate  at  the  t-th  instant  and  is 
denoted  by  8(t).  If  at  a  particular  time  instant  i,  the  resulting  optimal  bounding  ellipsoid  would 
be  of  a "  smaller  size  ",  thereby  implying  that  the  data  point  y(i)  conveys  some  "  information  " 
regarding  the  parameter  estimates,  then  the  parameters  are  updated.  Otherwise  E(  is  set  equal  to 
E( ,  and  the  parameters  are  not  updated. 

Let  the  ellipsoid  Et.t  at  the  (t-l)-th  instant  be  formulated  by 

Et  l=  {  0:  (  0  -  9  (t- 1)  )T  P  1  Ct-1)  (  0  -  0  (t-l) )  <  a2  (t-1)  ) 
for  some  positive  definite  matrix  P  (t- 1 )  and  a  non-negative  scalar  a2  (t- 1 ).  Then,  given  y<  t), 
an  ellipsoid  which  bounds  Et  j  S,  "tightly"  is 

{  0  :  (1  —  )  (0  —  9  (t-l)  )T  p'(t-l)  (0-0  (t-l)  )  +  X,  (  y(t)  -  0T<D(t)  )2 

<  (1  -Xt)oVl)  +  \y0  )  (3) 

where  the  forgetting  factor  A.(t)  satisfies  0  <  A.(t)  <1.  The  size  of  the  bounding  ellipsoid  is 
related  to  the  scalar  a2  (t-l)  and  the  eigenvalues  of  P(t- 1).  The  update  equations  for  9ft),  Pft) 
and  o2(t)  are  derived  in  [8].  The  optimal  ellipsoid  which  bounds  the  intersection  of  Et.j  and  St 
is  defined  in  terms  of  an  optimal  value  of  X(.  For  the  OBE  algorithm  of  [8],  the  optimum 
value  A.  t  is  determined  by  minimi^ition  of  c2(t)  with  respect  to  Xl  at  every  time  instant.  The 
minimization  procedure  results  in  A.  ,  being  set  equal  to  zero  (no  update)  if 

At)  +  52(t)  5^(0  '  (4) 

If  (4)  is  not  satisfied,  then  the  optimal  value  of  A.,  is  computed.  The  parameter  estimation 
procedure  is  depicted  in  Fig.  1.  An  outgrowth  of  the  modular  recursive  estimation  procedure  is 
a  parallel-pipelined  networking  structure  [1 1  J.  The  algorithm  is  such  that  the  computational 
complexity  of  the  infotmadon  evaluadon  (IE)  procedure  is  much  less  than  that  of  the  updating 
procedure  (UPD).  Since,  in  general,  a  good  number  of  data  samples  would  be  rejected  by  the 
IE,  both  the  EE  and  the  UPD  would  involve  significant  amounts  of  idle  rime.  A  viable  scheme 
then  is  to  configure  a  parallel-pipelined  network  comprising  of  such  modular  estimators  to 
process  signals  from  multiple  channels.  Apart  from  reducing  hardware  costs,  such  a  scheme 
would  offer  increased  reliability  since  the  failure  of  one  UPD  processor  would  not  cause  any 
of  the  channels  to  fail,  in  contrast  to  a  system  with  a  dedicated  UPD  processor  for  each 
channel. 


INFCPMA~:CN  ==CCESSCR  UFCATNG  PROCESSOR 
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IH.  EXTENSION  TO  ARMA  MODELS 


The  ARMA  model  described  by  (1),  can  be  rewritten  as 

w(t)  =  y(t)  -  0*T  <t>'(t)  (5) 

where  8*  is  the  vector  of  true  parameters  and  is  now  defined  by 
9  =  [at  ,  a^  Sjj  ,  c«  ,  cr 

O’(t)  =  [  y(t-l)  ,  ...  y(t-n)  ,  w(t-l) . w(t-r)  ]  T 

Here  again,  w(t)  is  assumed  to  be  bounded  in  magnitude  by  Yn-  Since  the  values  of  the  noise 
sequence  (w(t)  )  are  not  available,  the  regressor  vector  Ov(t)  is  not  known  exactly,  if 
however,  at  time  t,  an  esdmate  of  8*. 

T 


8  ( t )  =  [  aj(t), ...  an  (t)  Cj  (t),  ...cr  (t) 
is  available,  w(t)  could  be  estimated  by  its  estimate  e(t)  according  to 
e(t)  •  y(t)  -  8  T(t)<t»(t) 


(6) 


where 

O(t)  =  [  y(t-l)  ,  ..  y(t-n) ,  e(t-l) ,  ...  e(t-r)  ]  T  1 7) 

It  follows  from  (5)  and  the  boundedness  of  I  w(t)  I  that 
(  y(t)  -  e*T  <t>-(t)  )2  <  Yo2 

Hence  if  e(t)  *  w(t),  then  <J>'(t)  =  O(t)  and  for  a  suitable  y>0,  such  that  y  ‘  >  Yq2 

(  y(t)  -  8*T 0(0  )2  <  Y2  (8) 

Thus  8*  €  S. ,  where  S  is  defined  bv 

St  =  {  8  :  (  y(t)  -  8T  0(0  )2  }  <  Y  2 . 9  e  R  n  +  r  } 


Hence  if  the  difference  between  e(t)  and  w(t)  is  small,  applying  the  OBE  algorithm  will  vidd  a 
sequence  of  bounding  ellipsoids  ( E,  jin  the  parameter  space.  If  (8)  holds  for  all  t  and  9*e  En 
then  it  is  easy  to  see  that  8*  e  E,  for  all  time  instants  t.  The  optimal  bounding  ellipsoid  E.  is 
described  by 


E,- 

{  8  e  Rn  +  r  :  (  8  -  8  (t)  )T  P‘(t)  (8-9(0)  ^  a2(t)  } 

•9) 

and  the  update  equations  which  follow  directly  from  [8]  are 

P''(t) 

=  (l-X^p  'd-l)  +  X.t 0(0  0T (t) 

''  ;0  a) 

0(0 

=  9(t-l)  +  \  P(t)0(t)  o(t) 

1 10b) 

8(t) 

=  y(t)  -  9t  (t- 1)0(0 

i  !0c) 

a2(t) 

Xt(l-Xt)S2(0 

,  (,.x,)o w+K,r- 

<  lOd) 

where 

G(t)  =  0T(t)P(t- 1)0(0 

The  above  recursive  relations  along  with  the  initial  values 

P'HO)  =  1 ,  8  (0)  =  0  and  cr(0)  =  1/e  with  e  «1 

1 10ej 

form  the  basis  of  the  extended  optimal  bounding  ellipsoid  (EOBE)  estimation  algorithm! III. 
The  matrix  inversion  lemma  can  be  used  to  obtain  the  recursion  for  P(t) 

1  r  X  P(t-l)0(t)0T(t)P(t-l) 

P(t)  =  - [P(t-l)--! - 

1-X(  l-X+XtG(0 

It  is  easily  shown  from  (lOd)  that  for  the  EOBE  algorithm,  minimizing  o2(t)  with  respect  to  L 
at  every  time  instant  yields  the  same  updating  criterion  (4)  and  the  same  algorithm  tor 
determining  the  optimum  value  of  the  forgetting  /  updating  factor  k*  ,  as  in  [8].  The  algonthm 
thus  retains  the  discerning  update  strategy  and  the  modular  adaptive  filter  structure 


] 
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Finding  a  value  of  y 2  to  ensure  that  (8)  holds  for  all  time  instants  is  not  easy.  Instead,  we 
will  show  in  the  next  section  that  choosing  y  2  to  be  an  upper  bound  on  the  square  of  the 
magnitude  of  the  output  y(t)  will  ensure  that  the  parameter  estimates  obtained  by  applying  the 
EOBE  algorithm  converge  to  a  neighborhood  of  the  true  parameters. 

If  the  ARMA  process  is  stable  then  if  lw(t)l  is  bounded  so  is  ly(t)l.  The  output  process  v(t) 
can  be  expressed  as 

y(t)  =  hit)  *  w(t) , 

where  *  denotes  discrete  convolution  and  h(t)  is  the  equivalent  impulse  response.  Stability  of 
the  process  implies  that  there  exists  a  finite  M  such  that 


<  M  <oo 


and  if !  w(t)  I  <  yQ  then 


*2 


h(i)  I  I  wft-i)  I  <  M  yo  =  y 


In  order  to  obtain  a  value  for  y,  the  user  would  therefore  need  the  bound  on  lw(t)l  and  an 
estimate  of  M.  Or  alternatively,  the  outputs  could  be  monitored  and  a  bound  on  ly(t)l  could  be 
obtained  before  starting  the  actual  parameter  estimation  procedure.  A  loose  upper  bound  on 
either  M  or  the  magnitude  of  the  output  would  suffice  since  simulations  have  shown  that  using 
values  for  y  that  are  several  times  larger  or  smaller  than  the  actual  bound  on  ly(t)l  has  no 
deleterious  effects  on  the  quality  of  estimates  or  convergence  rate. 

rv.  CONVERGENCE  ANALYSIS 


We  will  first  show  that  if  the  updating  factor  sequence  is  that  which  minimizes  a2  at  every 


instant  (  denoted  bv 


)  then  the  parameter  estimates  converge.  To  show  that,  the 


following  lemma  will  be  needed. 

Lemma  1.  If  the  magnitude  of  the  noise  wit)  is  upper  bounded  and  the  process  is  stable  then 
cr(t),  defined  in  (lOd),  is  always  non-negative  provided  that  the  initial  values  9(0),  P'l(0)  and 
o2(0)  are  chosen  such  that  9T(6)P‘1(0)9(0)  <  cr  (0) 


Proof.  From  (10a),  (10b)  and  (10c) 


e  T(t>  p  '(t)  e  (t)  =  (i-x  leVop'it-iiOit-D  +  M  v2<t)  -  s2(t)  <  i-  AoT(t)  p(t>  ott)  i] 

l  ^  4  -  i 


=  ( l-  a.,  )  ( l-  x2 )  ...d-/..  i  e  T)0)  p  '(0)  e  (0) 


Xqic[y:(> 


i)  -  b'liM  1  -  O  (i)  P(i)  <E>(i)  ) 


where 


q„  =  yi-y, ) ...  (i- y  2  o  v  1. 1. 


Since  the  process  is  stable  and  ( w(t) )  is  bounded,  there  exists  a  y 2  such  that  y  2(t)<y2 
The  posinve  semi -definiteness  of  P"‘(t)  will  therefore  imply  that 
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t  t 

fld-X.)  efyjP'VoMO)  +  2^qit[y2-5\i)(  1 -\<tT(i)P(i)<I>(i) 

■  i  i  3 1 


]  >o 


But  from  (ItO 


_  X.  G(i) 

X.  C>T(i)  P(i)  <X>(i)  =  - 1 - 

i  i  .  t  r 


l-X.  +  X.G(i) 

\  i  v  r 


Using  (15)  in  (14)  yields 

V“T  x  2  2  1  "X. 

|T(1-X.)  e  (0)P  (0)8(0)  +>qit[Y  -5(i) - -  )  >0  (16) 

7=7  '  .  =  i  l-X.  +  XjG(i) 

From  (lOd),  a  non-recursive  expression  for  a2  (t)  can  be  obtained  as 

,  2  2  52(i)  (  1-  X. ) 

c(t)  =  (1-Xt)(l-X2)  ...(l-Xt)o‘(0)  +  2^t[Y  *— — — ”  1  (17) 

i  =  l  1  '  Xj+  A.  0(1) 

Since  a2  (0)  S  8T(0)P'l(0)8(0),  (17)  implies  that  o2(t)>0  for  all  t .  This  result  will  hold 

for  any  sequence  of  forgetting  factors  {Xj  )  with  0  £  Xt  <  1. 

Theorem  1.  If  the  assumptions  of  Lemma  1  are  satisfied  then 

1.  lim  e2(t)  e  [  0,  y2]  (18) 

l - »  90 

2.  lim  a(t)  e  [  0,  y2]  (19) 

Proof.  See  [13]. 

Note  that  throughout  this  section^  expressions  like  (18)  should  not  be  taken  to  mean  that 
lirr^  ,,.e2(t)  exists  but  rather  that  e*(t)  becomes  asymptotically  less  than  or  equal  to  r 


The  next  lemma  relates  the  positive  definiteness  of  P4(t)  to  the  richness  of  the  regTessor  vector 

Lemma  2.  If  there  exist  positive  a,  and  N  such  that  for  all  t 
t  +  N 

y  0(1)  0T(i)  >  a3  I  >0  (70) 

1 

then  there  exists  a  positive  a4  such  that 
P'l(t)  >  a4  l  >  0 

Proof  of  the  lemma  is  the  same  as  that  of  Theorem  4. 1  of  [8],  it  is  thus  omitted  here. 

Remark.  The  positive  definiteness  of  P  !(t)  implies  that  the  eigenvalues  of  P(t)  are  upper 
bounded. 

Theorem  2.  If  assumptions  of  Lemma  1  are  satisfied  and  (20)  holds  then  the  EOBE 
algorithm  ensures  : 

(a)  Parameter  difference  convergence 
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lim  II  0(t)  -  0(t-k)  II  =  0  (21) 

for  any  finite  k. 

(b)  Bounded  a  priori  prediction  errors 

lim  S2(t)  6  [  0  ,  y  2  ]  (22) 

t  — *  00 

(c)  Bounded  parameter  misadjustment 

lim  110(0-0’ II 2  <  My2  <  °°  (23) 

t  — ► 

for  some  finite  M. 


Proof. 

(a)  From  (10b)  and  (1 1) 


0(0-  0(t-l) 


X.t  2  <l>T(t)  P2  (t- 1 )  <t(t)  82(t) 
(  l-\  +  \  G(t))2 


S  emax(  P(t-l)  ) 


\/G(t)62(t) 

(1  -X,  +  X,  G(t) )‘ 


(24) 


If  (20)  holds  then  by  Lemma  2,  e—^  {  P(t-l)  ),  the  maximum  eigenvalue  of  P(t-l),  is 
bounded  for  all  t.  The  other  term  on  tne  right  hand  side  is  shown  to  tend  to  zero  in  the  proof  of 
Theorem  1 .  Thus 


110(0-0(1-1)11  -*0  (25) 

Applying  the  Minkowski  inequality  to  II  0(0  -  0(t-k)  II  and  using  (25)  completes  the  proof  of 
(21).' 

(b)  From  ( lOe)  and  ( 1 3)  it  follows  that 

G(t)S  em»  (PO-i)  )[ny2+  r  max  e2(i)  ]  (26) 

t-r  <  i  S  t- 1 

where  n  is  the  order  of  the  AR  process  and  r  is  the  order  of  the  MA  process.  If  (20)  holds  then 
5-max  {  PO'l)  )•  maximum  eigenvalue  of  P(t-l),  is  bounded.  Since  (e(t) }  is  bounded  by 
Tneorem  1,  (G(t) } is  bounded.  Then  it  can  be  shown  (see  Theorem  3.2  of  [8]  ),  that  the  a 
priori  prediction  errors  satisfy  (22). 

(c)  Since  w(t)  and  y(t)  are  bounded  by  yand  e(t)  is  aymptotically  bounded  by  Theorem  1.  it 
can  be  shown  that  for  large  enough  t 


0  T(1  )  <D(1)  <t>T(l)  0(1)  S  0(y2  ) 


i=t 


(27) 


where  0(t)  =  0(t)  -  0*(t),  and  N  is  given  by  Lemma  2.  Then  the  result  follows  by  using  (21) 
and  (20)  in  (27)  and  performing  some  algebraic  manipulations. 


Remark:  To  summarize,  the  above  analysis  has  shown  that  if  the  process  is  stable  and  if  the 
driving  noise  is  bounded  then  the  a  posteriori  prediction  errors  are  bounded.  In  addition  if  a 
richness  condition  is  imposed  on  the  regressor  vector,  then  the  a  priori  prediction  errors  are 
bounded  and  the  parameter  estimates  are  asymptotically  contained  within  a  neighborhood  of 
the  true  parameters. 
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V.  SIMULATION  RESULTS 

Simulations  have  been  performed  to  investigate  the  performance  of  the  EOBE  algorithm  vis 
a  vis  the  ELS  algorithm.  In  this  paper,  we  present  simulation  results  for  two  examples-  a  broad 
band  ARMA  (3,3)  process  and  a  non  SPR  ARMA(3,3)  process  where  the  indices  p,  q  in  an 
ARMA(p,q)  process  refer  to  the  orders  of  the  A(q*‘)  and  C(q'[)  polynomials,  respectively. 

Example  1.  Broad  band  ARMA  (3,3)  process 
The  output  data  { y(t) )  is  generated  by  the  following  difference  equation 
y(t)  =  -  0.4  y(t-l)  +  0.2  y(t-2)  +  0.6  y(t-3)  +  w(t)  -  0.6  w(t-l)  +  0.2w(t-2)  +  0.4  w(t-3) 

The  noise  sequence  (w(t)j  is  generated  by  a  pseudo-random  number  generator  with  a  uniform 
probability  distribution  in  [-1.0,  1.0].  The  upper  bound  y2was  set  equal  to  25.0.  The 
parameter  estimates  were  obtained  by  applying  the  EOBE  algorithm  to  1000  point  data 
sequences.  One  hundred  runs  of  the  algorithm  were  performed  on  the  same  model  but  with 
different  input  noise  sequences.  The  average  squared  parameter  error  L(t),  is  computed  for  the 
AR  coefficients  according  to  the  formula 


where  lj  (t),  the  squared  AR  parameter  error  at  time  t  for  the  j’th  run,  is  defined  by 

n 

lJ(t)  =  ^(ai(t)-ai  )2 

i  =  i 

with  a;  and  adt)  being  defined  by  (1)  and  (6),  respectively.  The  average  squared  parameter 
error  for  the  MA  coefficients  is  defined  analogously.  Fig.  2  displays  the  average  squared 
estimation  errors  for  AR  and  MA  parameters  using  the  EOBE  and  ELS  algorithms.  The  figures 
show  that  the  performance  of  the  two  algorithms  is  comparable.  It  may  be  noted  that  the  AR 
parameter  estimates  have  markedly  lower  variance  (about  the  true  parameters)  than  the  MA 
parameters.  The  average  number  of  updates  for  the  EOBE  algorithm  was  139  for  1000  point 
data  sequences.  Thus  less  than  15%  of  the  samples  are  used  for  updates,  as  compared  to  the 
ELS  algorithm  which  updates  at  every  sampling  instant. 

The  tracking  capability  of  the  EOBE  algorithm(with  a=0.2),  was  compared  with  that  of  the 
ELS  algorithm  (with  forgetting  factor=0.99).  The  same  model  was  used  to  generate  400  data 
points.  The  parameters  were  then  changed  by  150%  and  the  next  400  points  were  generated. 
Finally  the  last  200  points  were  generated  by  using  the  original  parameters.  The  average 
squared  parameter  error  was  evaluated  over  25  runs  and  is  shown  in  Figure  3.  Even  though  the 
formulation  of  bounding  ellipsoids  is  based  on  the  assumption  that  the  parameters  are  constant, 
the  simulation  results  show  that  the  algorithm  is  able  to  accommodate  changes  in  model 
parameters.  Analysis  of  the  tracking  ability  of  the  algorithm  is  currently  under  investigation. 

Example  2.  Non  SPR  ARM  A(  3.3)  process 

The  output  data  (y(t)|  is  generated  by  the  following  difference  equation 

y(t)  =  -  0.6  y(t-l)  -  0.58  yu-2)  -  *0.464  y(t-3)  +  w(t)  +  0.2  w(t-l)  +  0.6w(t-2)  +  0.2  wtt-3) 
The  noise  sequence  is  generated  as  in  the  first  example.  The  upper  bound  y 2  was  set  equal  to 
3.25.  The  maximum  values  (taken  over  25  runs  of  the  algorithm)  of  the  squared  residual 
errors,  at  each  iteration  were  computed  and  were  well  within  the  upper  bound  y  2  even  though 
the  SPR  condition  is  violated. 


VI.  CONCLUSION 

A  recursive  parameter  estimation  algorithm  has  been  extended  for  ARMA  parameter 
estimation.  The  main  features  of  the  algorithm  are  a  membership  set  theoretic  ‘ormulauon  and  a 
discerning  update  strategy.  Convergence  analysis  of  the  algorithm  has  bee-  erformed  under 
the  assumptions  that  the  process  is  stable  and  that  the  noise  is  bounded,  .  he  results  of  the 
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analysis  are  that  the  algorithm  yields  bounded  a  posteriori  prediction  errors  without  SPR  or 
persistence  of  excitation  type  condition.  With  a  persistence  of  excitation  condition  on  the 
regressor  vector,  boundedness  of  the  a  priori  prediction  error  can  then  be  established  and  the 
parameter  estimates  are  shown  to  converge  to  a  neighborhood  of  the  true  parameters. 
Simulation  results  show  that  the  performance  of  the  algorithm  is  comparable  to  the  ELS 
algorithm  while  requiring  far  fewer  updates. 
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ABSTRACT 

This  paper  investigates  an  extension  of  a  recursive  estimation  algorithm  (the  so-called  OBE 
algorithm)  [9-11],  which  features  a  discerning  update  strategy.  In  particular,  an  extension  of  the 
algorithm  to  ARMA  parameter  estimation  is  presented  here  along  with  convergence  analysis.  The 
extension  is  similar  to  the  extended  least-squares  algorithm.  However,  the  convergence  analysis  is 
complicated  due  to  the  discerning  update  strategy  which  incorporates  an  information-dependent 
updating  factor.  The  virtues  of  such  an  update  strategy  are  :  (i)  More  efficient  use  of  the  input  data 
in  terms  of  informadon  processing,  and  (ii)  a  modular  adapdve  filter  structure  which  would 
facilitate  the  development  of  a  parallel-pipelined  signal  processing  architecmrc.  It  is  shown  in  this 
paper  that  if  the  input  noise  is  bounded  and  if  the  process  is  stable,  then  the  a  posteriori  prediction 
errors  are  bounded  even  without  the  SPR  condition.  This  is  in  sharp  contrast  to  the  crucial  role  of 
the  SPR  condition  in  the  analysis  of  the  FJ-S  and  output  error  algorithms.  With  an  additional 
persistence  of  excitation  condition,  the  parameter  estimates  are  shown  to  converge  to  a 
neighborhood  of  the  true  parameters  and  the  a  priori  prediction  error  and  and  the  parameter 
Disadjustment  are  shown  to  be  asymptotically  bounded.  Simulation  results  show  that  the 
parameter  estimation  error  for  the  EOBE  algorithm  is  comparable  to  chat  for  the  ELS  algorithm. 
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I  INTRODUCTION 

In  many  adaptive  signal  processing  applications  such  as  speech  processing,  seismic  dnra 
processing  and  channel  equalization,  a  signal  y(t)  is  often  considered  as  the  output  of  an  OR  filter 
driven  by  unknown  white  noise  w(t)  [1].  The  signal  y(t)  can  therefore  be  modeled  as  an 
autoregressive  moving  average!  ARMA)  process  of  the  form 

y(t)  »  at  y(t-l)+..+  a^ytt-n)  +  w(t)  +  ct  w(t-l)  +.>  cr  w(t-r)  (1) 

Ruing  this  ARMA  model  to  the  measured  data  y(t),  t  =l..T  ,  requires  the  estimation  of  the 
parameters  al aa ,  cr„  cr .  Many  methods  for  the  estimation  of  ARMA  parameters  have  been 
proposed  in  the  literature,  particularly  from  the  spectral  estimation  viewpoint.  Among  the  more 
recent  ate  Cadzow’s  overdetermined  rational  equation  method  [21,  the  spectral  matching  technique 
of  Friedlander  and  Pont  [3],  and  the  extended  Yule- Walker  method  of  Kaveh  [4],  A  common 
feature  of  these  methods  is  the  use  of  the  sample  autocorrelation  sequence  of  the  output  process 
y(t).  In  the  context  of  system  identification,  the  extended  least-squares  (ELS),  the  recursive 
maximum  likelihood  (RML)  and  multi-stage  least-squares  algorithms  have  been  used  to  recursiveiy 
estimate  ARMA  parameters  [5,6].  The  ELS  algorithm  uses  the  a  posteriori  prediction  error  e(t),  as 
an  estimate  of  w(t).  The  regressor  vector  is  formed  from  y(t-l),..,  y(t-n)  and  e(t-l),...  e(t-r).The 
standard  recursive  least-squares  (RLS)  algorithm  is  then  employed  to  update  the  estimates.  The 
algorithm  is  conceptually  simple  but  restrictive  in  the  sense  that  convergence  of  the  algorithm  can  be 
assured  only  if  the  underlying  transfer  function  H(q'^)  =  1/  C(q^)  -  1/2  is  strictly  positive  real 
(S PR),  with  q'*  being  the  delay  operator  and 

C(q*1)  al+Cjq‘l+c,  q'2  * +  cr  q* 

The  RML  algorithm,  which  uses  a  filtered  version  of  the  regressor  vector  used  in  the  ETrs 
algorithm,  does  not  require  H(q'^)  to  be  SPR,  However  the  estimates  have  to  be  monitored  and 
projected  into  a  stability  legion  to  ensure  convergence[5]. 

In  addition  to  the  aforementioned  least-squares  based  methods,  there  exists  a  different  class  of 
estimation  algorithms  that  estimate  membership  sets  of  parameters  which  are  consistent  with  the 
measurements  and  noise  constraints  [7]-{il].  These  algorithms  are  particularly  useful  when  the 
noise  distribution  is  unknown  but  constraints  in  the  form  of  bounds  on  the  instantaneous  values  of 
the  noise  are  available.  To  the  best  of  our  imowledge.  none  of  the  algorithms  has  been  applied  to 
the  problem  of  ARMA  parameter  estimation.  Among  these  algorithms  based  on  membership  sets,  a 
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group  of  seminal  recursive  algorithms  are  the  so-called  opdmai  bounding  ellipsoid  (OBE) 
aigorithms(9- 1 1],  The  OBE  algorithms  had  been  developed  using  a  set-theoretic  formulation  and 
are  applicable  to  autoregressive  with  auxiliary  input  (ARX)  models  with  bounded  noise.  One  of  the 
main  feamres  of  these  temporally  recursive  algorithms  is  a  discerning  update  strategy.  This  feature, 
obtained  by  the  introduction  of  an  informadon  dependent  updating/forgetring  factor,  yields  a 
modular  structure  thereby  increasing  the  potential  for  concurrent  and  pipelined  processing  of 
signals.  The  presence  of  such  a  forgetting  factor  also  gives  the  algorithms  the  ability  to  track  slowly 
time  varying  parameters.  The  algorithms  have  the  advantageous  feature  of  automatic  asymptotic 
cessation  of  updates.  If  a  loose  upper  bound  on  the  noise  magnitude  is  icnown,  and  if  the  input  is 
persistendy  exciting  and  sufficiently  uncorrelated  with  the  noise,  then  it  can  be  shown  that  the 
parameter  estimates  converge  asymptotically  to  a  neighborhood  of  the  true  parameter  vector. 

In  this  paper,  we  extend  one  of  the  OBE  aigorithmsfl  1]  to  the  ARMA  case.  For  the  ARMA 
parameter  estimation  problem,  the  OBE  algorithm  cannot  be  applied  in  its  present  form.  However, 
by  assuming  that  the  input  white  noise  is  bounded  in  magnitude,  the  OBE  algorithm  can  be 
extended  in  a  manner  .similar  to  the  El .3  algorithm.  Convergence  analysis  of  the  resulting  algorithm 
is  performed  under  the  assumption  that  the  process  is  stable  and  that  the  noise  is  boundedThe  a 
posteriori  prediction  error  is  shown  to  be  bounded  without  imposing  any  SPR  condition.  This  is  in 
contrast  to  the  convergence  analysis  of  the  ELS  or  output  error  algorithms  in  which  the  SPR 
condition  is  used  to  prove  boundedness  of  the  prediction  errors  and  convergence  of  parameter 
estimates!  12].  By  imposing  a  persistence  of  excitation  condition  on  the  regressor  vector,  the  a 
priori  prediction  error  of  the  extended  OBE  algorithm  is  shown  to  be  bounded  and  the  parameter 
estimates  are  shown  to  converge  to  a  neighborhood  of  the  true  parameter  vector. 

The  paper  is  organized  in  the  following  manner.  In  Section  H,  a  brief  review  of  the  OBE 
algorithm  and  its  properties  is  presented.  In  Section  HI,  the  algorithm  is  extended  to  ARMA 
parameter  estimation.  Convergence  analysis  of  the  extended  algorithm  is  performed  in  Section  TV. 
The  performance  of  the  algorithm  is  compared  with  the  ELS  algorithm  through  simulation  studies 
in  Section  V.  Section  VI  concludes  the  paper. 


13  THE  OBE  ALGORITHM 

Consider  the  ARX  model  described  by 

y(t)  =  at  y(t-l  )-*■..+■  ^  y(t-n)  +•  b0u(t)  +■  bt  aft- 1)  +..*  bn  u(t-m)  +■  v(t) 
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The  above  equation  can  be  recast  as  : 

y(t)  *9*T<$(c)  +  v(t)  (2) 

where 

9  3  [i| ,  a^  ...  a^  t  bg  •  bj , bjjj  ] 
is  the  vector  of  ante  parameters  and 

<$(t) » C  y(t-l),  y(t-2), ..  y(t-a),  u(t),  u(t-l),  -  u(t-m)  ] T 
is  the  regressor  vector.  A  key  assumption  here  is  that  the  noise  is  bounded  in  magnitude.  Le~, 
there  exists  a  y0  £  0 .  such  that 

v  ^t)  £  y02  for  all  t,  hence 
( y(t)  *  9^dKt)  )2  <  y02 

Let  St  be  a  subset  of  the  euclidean  space  91  n"’aH'1,  defined  by 
St=  {9:  (y(t)-  9T  <D(t) )2  <;  y02  . 9 s 91  } 

From  a  geometric  point  of  view,  St  is  a  convex  polytope  in  the  parameter  space  and  contains  the 
vector  of  true  parameters.  The  OBE  algorithm  starts  off  with  a  large  ellipsoid.  Eg  .  in 
which  contains  all  possible  values  of  the  modelled  parameter  9*.  After  the  first  observation  y(l)  is 
acquired,  an  ellipsoid  is  found  which  bounds  the  intersection  of  E0  and  the  convex  polytope 
S^This  ellipsoid  must  be  optimal  in  some  sense,  say  minimum  voiume[9,I0]  or  by  any  other 
criterion[9,ll],  to  hasten  convergence.  Denoting  the  optimal  ellipsoid  by  Et  one  can  proceed 
exactly  as  before  with  the  future  observations  and  obtain  a  sequence  of  optimal  bounding  ellipsoids 
(  Et }.  The  center  of  the  ellipsoid  Et  can  be  taken  as  the  parameter  estimate  at  the  t-th  instant  and  is 
denoted  by  9(t).  If  at  a  particular  time  instant  i.  the  resulting  optimal  bounding  ellipsoid  would  be 
of  a  "  smaller  size  ",  thereby  implying  that  the  data  point  y(i)  conveys  some  "  information  " 
regarding  the  parameter  estimates,  then  the  parameters  arc  updated.  Otherwise  E,  is  set  equal  to  Ei4 
and  the  parameters  are  not  updated.  It  can  also  be  shown  [11]  that  all  the  ellipsoids  (  E. )  contain 
the  true  parameter  9*,  provided  that  Eg  does. 

Let  the  ellipsoid  E^  at  the  (t-l)-th  instant  be  formulated  by 

Et4  a  {  0  :  (9-9  (t-l)  )T  P-1  (t-l)  (9-9  (M) )  5  ^  (t-l)  ) 
for  some  positive  definite  matrix  P  (t-l)  and  a  non-negative  scalar  cr  (t-l). Then,  given  y(t).  an 
ellipsoid  which  bounds  Et.l  n  St  "tightly"  is 

{  9  :  (1  - \ )  (0  -  9  (t-l)  )T  P'l(M)  (  9  -0  (t-l)  )  +  \  ( y(c)  - 9Td>(t)  )2 

£  (l-L  )  a  Vo  ~  V'o  5  0) 

where  the  forgetting  factor  A.(t)  satisfies  0  ^  Ait)  <1.  The  size  of  the  bounding  ellipsoid  is  related 
to  che  scalar  a2 (t-l)  and  the  eigenvalues  of  P(t-l).  The  update  equations  for  9(c),  P(t)  and  CT’ft)  are 
derived  in  [11].  The  optimal  ellipsoid  which  bounds  che  intersection  of  E.4  and  St  is  defined  in 
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tenns  of  an  optimal  value  of  For  the  OBE  algoritiim  of  [11],  the  optimum  value  X*{  is 
determined  by  annuitization  of  <r(t)  with  respect  to  X,  at  every  time  instant  The  mmirmzxtion 
procedure  results  in  X*  being  set  equal  to  zero  (no  update)  if 

trft+Sfy)  s  iHi)  (4) 

If  (4)  is  not  satisfied,  then  the  optimal  value  of  \  is  computed.  The  parameter  estimation  procedure 
is  depicted  in  Fig.  1.  An  outgrowth  of  the  modular  recursive  estimation  procedure  is  a 
parallel-pipelined  networking  structure  [13].  The  algorithm  is  such  that  the  computational 
complexity  of  the  information  evaluation  (IE)  procedure  is  much  less  than  that  of  the  updating 
procedure  (UPD).  Since,  in  general,  a  good  number  of  data  samples  would  be  rejected  by  the  IE, 
both  the  IE  and  the  UPD  would  involve  significant  amounts  of  idle  time.  A  viable  scheme  then  is  to 
configure  a  parallel-pipelined  network  comprising  of  such  modular  estimators  to  process  signals 
from  multiple  channels.  Apart  from  reducing  hardware  costs,  such  a  scheme  would  offer  increased 
reliability  since  the  failure  of  one  UPD  processor  would  not  cause  any  of  the  channels  to  foil,  in 
contrast  to  a  system  with  a  dedicated  UPD  processor  for  each  chapel 


Figure  I  Modular  recursive  :  ammeter  estimator 
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m  EXTENSION  TO  ARMA  MODELS 

The  ARMA  model  described  by  (1),  can  be  rewritten  as 

w(t)  »  y(t)  -  Q*7  <D'(0  (5) 

where  9*  is  the  vector  of  true  parameters  and  is  now  denned  by 
9*  =  [ax ,  a,  -  aa ,  c1 ,  Cj cr  ]T 
®‘(t)  =*  C  y(c-I) .  y(t-a) ,  w(M)  f  _w(t-r)  ] T 

Here  again,  w(t)  is  .'it«irnwri  to  be  bounded  in  magnitude  by  Yq*  Since  the  values  of  the  noise 
sequence  { w(t) }  are  not  available,  the  regressor  vector  <D’( t)  is  not  known  exactly.  If,  however,  at 
rime  t,  an  estimate  of  9*. 

9  ( t )  *  [  ax(t), an  (t)  Cj  (t),  ..,cr  (t)  lr  (6) 

is  available,  w(t)  could  be  estimated  by 

e(0  =  y(0  -  9  T  (t)cD(t) 

where 

<D(0  =[y(t-l) ,  ..  y(t-n)  ,  s(t-i) .  —  e(t-r)  ] T  (7) 

It  follows  from  (5)  and  the  boundedness  of  1  w(t)  I  that 
(  y(t)  -  e^O)2  £  Y02 

Hence  if  e(t)  -  w(t),  then  d)'(t)  » <t>(t)  and  for  a  suitable  yZ  0,  such  that  7  2  >  Yq2 

(  y(t)  -  9'rr<D(t))2  £  Y2  (8) 

Thus  9*  <=  St ,  where  St  is  defined  by 

St=  (9:  ( y(t)  -  9T<t>(0)2}  £  Y2.9sRa+t  } 

In  essence,  if  the  difference  between  e(t)  and  w(t)  is  small,  applying  the  OBE  algorithm  will  yield  a 
sequence  of  bounding  ellipsoids  {Et }  in  the  parameter  space.  If  (8)  holds  for  all  t  and  9  s  Eg ,  then 
it  is  easy  to  see  that  9*  s  Et  for  ail  rime  instants  t.  The  optimal  bounding  ellipsoid  Et  is  described 
by 

Et  =  (  9  s  Rtt*r :  (  9  -  9  (0  )T  P‘l(t)  (9-9(0)  £  oT 0  }  (9) 

and  the  update  equations  which  follow  directly  from  [11]  are 

P‘l(t)  =*  (l-XpP'l(t-i)+  L  <tK0<t>T(0  (10*) 

9(0  =  9(t-l)  +  \  P(t)<t>(0  5(t)  (10b) 

5(0  =  y(0  -  9T(t-l)<J>(t)  dOc) 
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ff'(0 


(1-  \)  <J^t-i) 


1-V\G(0 


(10d) 


where 

G(t)  »$T(r)P(t-t)<&(t)  (lOe) 

The  above  recursive  relations  along  with  the  initial  values 

P*l(0)  *1,  8(0)*0and  o^O) *  1/e  .with  e  «1 
form  the  basis  of  the  extended  optimal  bounding  ellipsoid  (EOBE)  estimation  algorithm!  14].  The 
matrix  inversion  lemma  can  be  used  to  obtain  the  recursion  for  P(t) 


P(0 


1  r  X  Pft-lJdKt)^7 (t)  Pft-1)  , 

—  [  P(M)-  - - J 

l-X  1-X  +X  G(t) 


(10f) 


It  is  easily  shown  from  (lOd)  that  for  the  EOBE  algorithm,  minimizing  cr(t)  with  respect  to  \  at 
every  rime  instant  yields  the  same  updating  criterion  and  the  same  algorithm  for  determining  the 
optimum  value  of  the  forgetting  /  updating  factor  X*t ,  as  in  [1 1].  In  particular. 


If  cr^t)  +  S^t)  £  y^t)  then  X*t  =  0, 


(11) 


where 


otherwise 

X*t  =  min  (a,  vt ) 


(12) 


and 


l-P(t) 


'■=\ 


l-G(t) 


G(t) 


1+  P(t)  (  G(t)  -i) 


2  2, 

5J(t) 


ifSz(t)=0 
if  G(t)  =  1 


(12a) 

(12b) 


]  if  0(0  (G(t)-l)  +  1  >0  (12c) 


if  (3(t)  ( G(t)  - 1)  +  1  S  0  (12d) 


(12 e) 


The  algorithm  thus  retains  the  discerning  update  jtrategy  and  the  modular  adaptive  filter 
structure!  11,13]. 
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Choice  of  y  2 

Finding  a  value  of  y  2  to  ensure  that  (8)  holds  for  all  rime  instants  may  aot  be  easy.  Instead,  we 
will  show  in  the  next  section  that  choosing  y  2  to  be  an  upper  bound  on  the  square  of  the  magnitude 
of  the  output  y(t)  will  ensure  that  the  parameter  estimates  obtained  by  applying  the  EOBE  algorithm 
converge  to  a  neighborhood  of  the  true  parameters. 


It  is  easy  ©  see  that  if  iw(t)l  is  bounded  so  is  ly(t)U  ns  the  output  y(t)  is  related  to  the  input 
according  to 

Gil'S 

Y(z)  W(z) 

A(z) 

where  Y(z)  and  W(z)  are  the  z- transforms  of  y(t)  and  w(t),  respectively,  and 
A(z*1)  =  1  +  a^  z  *l  +  z‘2  + ..  +  an  z-™ , 

CCz*1)  =  1  +  ct  z  'l  +•  Cj  z‘2  + ..  +  cr  z* 

The  output  process  y(t)  can  thus  be  expressed  as 
y(t)  =  h(t)  *  w(t) , 

where  *  denotes  discrete  convolution  and  h(t)  is  the  impulse  response  of  the  filter  C(zrl)  /  A(z'1). 
Assume  that  the  process  is  stable,  then  there  exists  a  finite  K  such  that 


h(i)  I  £  K  <°° 


i*0 


and  if  I  w(t)  I £  y0  then 


y(t)  i  <;  £ih<i)i  i  w(t-i)  I  s  K;  =  y 

i*0 


(13) 


In  order  to  obtain  a  value  for  y,  the  user  would  therefore  need  the  bound  on  !w(t)l  and  an 
estimate  of  K.  Atemadveiy,  the  outputs  could  be  monitored  and  a  bound  on  ly(t)l  could  be  obtained 
before  starting  the  actual  parameter  estimation  procedure.  A  loose  upper  bound  on  either  K  or  the 
magnimrift  of  the  output  would  suffice  since  simulations  have  shown  that  using  values  for  y  that 
are  several  rimes  larger  or  smaller  than  the  actual  bound  on  ly(t)l  has  no  deleterious  effects  on  the 
quality  of  estimates  or  convergence  rate. 
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IV  CONVERGENCE  ANALYSIS 

We  will  first  show  chat  if  the  updating  factor  sequence  is  chat  which  minimizes  a2  at  every 
instant  ( denoted  by  (  X*t }  )  then  the  parameter  estimates  converge.  To  show  that,  the  following 
lemma  will  be  needed. 

Lemma  L  If  the  magnimrfc  of  the  noise  w(t)  is  upper  bounded  and  the  process  is  stable  then 
aHt),  defined  in  (lOd),  is  always  non-negative  provided  that  the  initial  values  8(0),  P'HO)  and 
<r(0)  are  chosen  such  that  9T(O)P‘I(O)0(O)  £  a2  (0) 

Proof.  From  (10a),  (10b)  and  (10c) 

9  T(t)  P'l(t)  0  (t)  =  (1-X)  0  T(t-1)  P4(t-i)  8  (M)  +  H  y2(t)  -  SXt)  (1-  X<t>T(t)  P(t)  <b(t) )  1 
=  ( 1-  \ )  ( 1-  Lj )  _(1-Lt)  8  T(0)  P'l(0)  0  (0) 

t 

+ X  -  s2®  ( 1  - p®  ^0)1 

i«l 

where 

q.t=  L.(l-X.+l)...(l-X)  >  0  VU 

Since  the  process  is  stable  and  { w(t) }  is  bounded,  there  exists  ay2  such  that  y  2  (t)  £  y  2 
The  positive  semi-definiteness  of  P*l(t)  will  therefore  imply  that 

t  t 

fJ(l-L.)  0T(O)Pl(O)0(O)  +^q.[[y2-52(i)(l-Ld>T(i)P(i)cI>(i)  )]  >0  (14) 

i»l  i*l 


Butfiom(lOf) 


x,d>T(i)  pq  m)  = 


JL  G(i) 


l-L*LG(i) 

L  l 


Using  (15)  in  (14)  yields 


n^y  9T(O)P’l(O)0(O)  ^(y^cn 


1-3L 


L»1 


i«l 


1-  X.  +  X.G(i) 


]  ^  0 


From  (lOd),  a  non- recursive  expression  for  cr  (t)  can  be  obtained  as 


r 

<r2(t)  =  (l-X1)(l-LI)...(t-L[)o2(0)  +  £qJy 


2  S2(i)(l-X.) 

I  -X.-hL.Gtf) 


i»i 


(15) 


(16) 


(17) 
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Sine:  cz  (0)  >  9T(O)P*l(O)0(O),  (17)  implies  that  cr(t)  >0  for  ail  t .  This  result  will  hold  for 
any  sequence  of  forgetting  factors  (X,  j  with  0  S  X,  <  l. 

Remark:  The  fact  that  a2  (t)  is  non-negative  and  non- increasing  will  play  a  crucial  role  in  the 
subsequent  convergence  analysis.  It  also  guarantees  chat  the  ellipsoids  ore  non  singular  (non  empty) 
sets. 

Theorem  L  If  the  assumptions  of  Lemma  1  are  satisfied  then 

1.  lim  e2(t)  a  [O.y2]  (18) 

2.  lim  a  (t)  s  [  0,  y2]  (19) 

«• 

Note  that  throughout  this  section,  expressions  like  (18)  should  not  be  taken  to  mean  that 
HmL  efrrt  exists  necessarily,  but  rather  that  e^t)  becomes  asymptotically  less  than  or  equal  to  y2. 

Proof.  It  is  easily  shown  from  (10b),  (10c)  and  (15)  that  the  a  posteriori  and  a  priori  prediction 
errors  are  related  by 

l-X 

eft)  =  - i - 5(t)  (20) 

.  l-Xt  +  XcG(t) 

Note  that  the  non-aegadveness  of  G(c)  implies  that  e2(t)  £  S^t).  Substituting  (20)  in  (lOd)  and 
using  the  fact  that  0  £  \  £  a  <1,  yields 

a2(t)  £  <r(t-i)  +  X  T  -  X  ( — - — l—  )e:(t)  (21) 

l-\ 

Since  G(t)  is  non-negadve 

a2(t)  £  «r2(t-l)  X^lT-fii)] 

or 

Xt(y2- e2(t)  ]  >  a2(t)  -  a'(t-l) 

The  tight  hand  side  tends  to  zero  since  cr  (t)  is  a  non  negative,  non  increasing  sequence.  Thus 

fim  Xt[y2-e2(t)]  e  [0,rl  '  (22) 

This  the  case  if  and  only  if  either 

1.  lim  e2(0  a  [0,  y2]  in  which  case  (18)  is  established. 


(23a) 
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2.  lim  X£  =  0  and  lim  Xte  (t)  »  0 


Let  the  forgetting  factor  which  minimi?^  <j2  (t)  u  every  instant  t  be  denoted  by  X*t,  then 


if  X  >  0,  then  £  0 

dXt  xt=xt 


Taking  derivatives  with  respect  to  \  in  (lOd),  and  using  (24)  yields 

.  \2H)G(  o 

<T‘(t)  £  <tVo - — : - 2 

( i-x +x  G(or 

The  non-negativity  of  a2  (t)  therefore  implies 


^  X.2S2(i)G(i) 

i»i  ( 1  -  X.*  +  X."  G(i)  )2 


s  <r(0)  -  a2(t)  <  oo 


Hence 


X*282(t)  G(t) 

lim  - 7 — 7 - ;  =  0  (27) 

t—  ( 1  -  Xt  +  Xc  G(t) ) 

Denote  the  optimai  forgetting  factor  by  X,  for  ease  of  notation-  Substituting  (20)  in  (27)  yields 

X2G(t)e2(t)  -►  0  (23) 

Thus  if  (23b)  holds,  then  for  all  A  >  0,  there  errists  >  0  such  that  for  all  t  >  N:  , 

&  (29a) 

X^t)  <  A  (29b) 

X2tG(t)  e^t)  <  A  (29c) 

Consider  the  four  cases  of  (12)  applicable  to  this  situation 
Case  1.  S^t)  =  0.  Then  efy)  £  S^t)  <;  y2. 

Case  2-  G(t)  =  1,  Xt»  (l-(3(t)  )/2  <  A  .  Hence  [3(t)  >  l  -  2  A  .  It  then  follows  from  the 

definition  of  |3(t)  that  g^t)  £  y2 

Case  3.  P(t)  (G(t)  - 1)  +  1  S  0.  If  A  <  a,  then  (11)  has  to  hold  and  so  e^t)  S  y2 

Case  4.  p(t)  (G(t)  - 1)  +  1  >  0.  Define 


P’(t)  £ 


y2-  cr2(t-i) 
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rl-X>\G(t)  2 

m  -  m  ; — ] 


l-V 


(31) 


For  this  case  the  following  relation  between  \  and  f$(t)  can  be  derived  from  (1 2c),  exactly  as  in 

cm. 


pc) 


l-\(G(t)  -1)  -  2Xj 


[U^GW  l2 

Substituting  (32)  into  (31)  and  multiplying  by  s^t)  yields 

,  ,  a  xjG(t)eV) 

Y- <T(t-l) »  e2(t)  m  *  e  ») - “T” 


(32) 


(33) 


(1*  V 


Substituting  (29)  into  (33)  gives 

Y2-ol(t-l)  2  e2(t)  A— 
d-\) 


a  e2(t) — a_ 

(1-4) 

Thus 

fi2(t)  +  a2(t-l)  <1  y2riO(A) 

and  since  A  can  be  arbitrarily  small,  (18)  is  satisfied.  Furthermore,  from  (lOd) 

<r2(t)-y2  £  (1 -Xt)(<y2(t-l)-v2) 

And  from  (11),  ^  >  0  if  a2  (t)  >  y2  This,  together  with  Lemma  1,  would  imply  that 
lim  a2(t)  e  (0  ,y2] 

t-*  — 

Hence  (1°)  is  also  satisfied. 


Remark.  Boundedness  of  e‘(t),  the  a  posteriori  prediction  error,  has  been  shown  without 
imposing  any  persistence  of  excitation  conditions  on  the  regressor  vector  .  Furthermore,  in 
contrast  to  the  ELS  algorithm,  the  SPR  condition  is  not  required. 


Boundedness  of  Jp(t),  the  a  priori  prediction  error,  and  convergence  of  the  parameter  estimates 
to  a  neighborhood  of  the  true  parameter  can  be  assured,  by  requiring  the  regressor  vector  to  be 
persistently  exciting-  The  next  lemma  relates  the  positive  definiteness  of  P"l(t)  to  the  richness  of  the 
regressor  vector  <$(t). 


1  1 


Lemma  2.  If  there  exist  positive  (X;  and  N  sued  chat  for  ail  t 

i+N 

£  WS)  >  a,  I  >  0  (34) 

i»c 

then  there  exists  a  positive  a4  such  that 

P-l(t)*<x4I>0 

Proof  of  the  lemma  is  the  same  as  that  of  Theorem  4.1  of  [11],  it  is  thus  omitted  here. 

Remark.  The  positive  definiteness  of  P"l(t)  implies  that  the  eigenvalues  of  P(t)  are  upper 
bounded. 


Theorem  2.  If  assumptions  of  T-wrim*  1  are  satisfied  and  (34)  holds  then  the  HOBE  algorithm 
ensues : 

(a)  Parameter  difference  convergence 

lim  II  9(t)  -  9(t-k)  II  =0  (35) 

for  any  finite  k. 

(b)  Bounded  a  priori  prediction  errors 

lim  82(t)  g  CO, y2]  (36) 

t  -*  •• 

(c)  Bounded  parameter  misadiustment 

lim  II  9(c) -9* II2  £  My2  <  °°  (37) 

for  some  finite  M. 

Proof. 

(a)  From  (10b)  and  (lOf) 

2  V  2<PT(t)P2(t-l)d>(t)52(t) 
il  9(c)  -  9(t-l)  II  =»  — - - - 

(  +  \  G(t) ) 2 


*  eM(  KM)  J 


Xc2G(t)52(t) 

(l-Xt+XtG(c))2 


(38) 


If  (34)  holds  then  by  Lemma  2,  eTnax  (  P(t-l)  ),  the  maximum  eigenvalue  of  P(t-l),  is  bounded 
for  ail  t ,  and  hence  by  (27) 

II  9(t)  -9(c-l)  II  -+Q  (39) 

Applying  the  Minkowski  inequality  to  II  9(t)  -  9(t-k)  II  and  using  (39)  completes  the  proof  of  (35). 
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(b)  From  (lOe)  and  (13)  it  follows  that 

c«>  S  <„  (P(M)  I  [  a? 1  *  t  rax  efo  I  (40) 

M  S  iS  6-1 

where  a  is  the  order  of  the  AR  process  and  r  is  the  order  of  the  MA  process.  If  (34)  holds  then 
gnuT  (  P(t-l)  },  the  maximum  eigenvalue  of  P(t-l),  is  bounded.  Since  (e(t)}  is  bounded  by 
Theorem  1,  { G(t) )  is  bounded.  Then  it  can  be  shown  (see  Theorem  3-2  of  [11] ),  that  the  a  priori 
prediction  errors  satisfy  (36). 

(c)  From  (18)  we  conclude  that  there  exists  Nt  such  that  for  all  t  >  MI,  I  e(t)  I  £  y. 

Define 

(0  *  [  y(t-l).  y(t-2),  -,y(t-a)  ]T, 

(t)  *  [  e(t- l),e(t-2),  ..4(t-r)  ]T, 
x°2(t)  =  [  w(t-l),  w(t-2),  ^w(t-r)  ]T, 

Denote  the  actual  and  the  estimated  AR  parameters  by  a  and  a(t),  respectively,  and  the  actual  and 
the  estimated  MA  parameters  by  c  and  c(t),  respectively.  Thus 

I  y(t)  -  aT(t)  xx(t)  -  cT  (t)  x£)  I  S  y  (41) 

Substituting  (1)  in  (41)  yields 

laT(t)xl(t)+  C  +  w(t)  +  CT xjt)  -  cT xj(t)  I  £  y  (42) 

where 

a  (t)  =.a  (t)  -  a  and  c(t)  =  c(t)  -  c . 

Hence 

I  a  T(t  )x;(t)  +  c  T(t)  x£t)  I  £  y  +•  lw(t)  M  c xjt)  l+l  cT<t)  I  (43) 

From  (18),  and  the  boundedness  of  (w(t)  },  it  follows  that  the  right  hand  side  of  (43)  is  of  0(y). 
Since  <Dr(t)  =  [  xtT  (t),  XjT  (t)  ]T ,  it  follows  chat 
!9T(t)dKt)l  £  0(y) 

where 

9T(t)>[ar(t),‘5r(t)]T 

Hence 

l9T(t)<K(t)0T(t)9(t)l  £  0(y 2 ) 

So 

£9T(l)<$(l)<&T(l)9(l)  S  0(y: )  (44) 

l«t 

From  (35),  for  ail  z  >  0,  there  exists  >T  such  chat  for  ail  t  >  N* 
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II  0(0  -  9(c+l)  II  =  II  9(0  -  9(t  +  l)  II  <  £ . 

Hence  for  any  finite  !c 

II  9(0  -  9(t+k)  U  =  II  9(t)  -  9(t+l)  +  8(t+L)  -  8(i+2)  + ..  +  9(t+ic-l)  -  8(t+k)  II 

Applying  The  Minkowski  inequality  yields 

II  0(t)  -  9(t+k)  II  S  [  II  0(t)  -  9(t+l)  11+  il  0(t+l)  -  0(t+2)  1I+-+  ll  9(t+k-l)  -  9(t+k)  111 

£  0(e)  (45) 

Using  Lemma  A-l,  (see  appendix),  yields  for  t  >  max  ( Nx ,  N* ) 

9T(t  )(£  dKD<&T(D  ]  9(0  *  0(Y2  )  +  0(e2) 

And  substimting  (34)  yields 

lim  0T(t)  9(0  ^  Oiy1 ) 

and  hence  (37)  follows. 

Remark:  To  summarize,  the  covergence  analysis  has  shown  that  if  the  process  is  stable  and  if  the 
driving  noise  is  bounded  then  the  a  posteriori  predicdon  errors  are  bounded.  In  addition  if  a 
sufficient  richness  condition  is  imposed  on  the  regressor  vector,  then  the  a  priori  prediction  errors 
are  bounded  and  the  parameter  estimates  are  asymptotically  contained  within  a  neighborhood  of  the 
true  parameters. 

V  SIMULATION  RESULTS 

Simulations  have  been  performed  to  investigate  the  performance  of  the  HOBE  algorithm  vis  a  vis 
the  ELS  algorithm.  In  this  paper,  we  present  simulation  results  for  two  examples-  a  broad  band 
ARMA  (3,3)  process  and  a  narrow  band  ARMA(2,2)  process  where  the  indices  p,  q  in  an 
ARMA(p,q)  process  refer  to  the  orders  of  the  A(q*1)  and  C(q'1)  polynomials,  respectively. 

Example  I.  Broad  band  ARMA  (3,3)  process 

The  output  data  (y(t)}  is  generated  by  the  following  difference  equadon 

y(t)  =  -  0.4  y(t-l)  +  0.2  y(t-2)  +  0.6  y(t-3)  +  w(t)  -  0.6  w(t-l)  +  Q.2w(t-2)  +  0.4  w(t-3) 

The  noise  sequence  ( w(t) }  is  generated  by  a  pseudo-random  number  generator  with  a  uniform 
probability  distribution  in  [-1.0,  i.0].  The  upper  bound'/2  was  set  equal  to  25.0.  The  parameter 
esdmates  were  obtained  by  applying  the  EOBE  algorithm  to  1000  point  data  sequences.  One 
hundred  runs  of  the  algorithm  were  performed  on  the  same  model  but  with  different  input  noise 
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sequences.  The  average  squared  parameter  error  L(t),  is  computed  tor  the  AR  coefficients 
according  to  the  formula 

too 

u‘>  =  i5oI'i"> 

j-t 

where  Ij  (t),  the  squared  AR  parameter  error  at  time  t  for  the  j’th  run,  is  defined  by 

lj(0  =■£(!(!) -a.,)2 

i«l 

with  a^  and  a^(t)  being  defined  by  (1)  and  (5),  respectively.  The  averafe  squared  parameter  error  for 
the  MA  coefficients  is  defined  analogously.  Figure  2  displays  the  average  squared  esdmadon  errors 
for  .AR  and  MA  parameters  using  both  the  EOBE  and  the  FT-S  algorithms.  The  curves  show  that  the 
performance  of  the  two  algorithms  is  comparable.  It  may  be  noted  that  the  AR  parameter  estimates 
have  markedly  lower  variance  (about  the  true  parameters)  than  the  MA  parameters.  The  average 
dumber  of  updaies  for  the  EOBE  algorithm  was  139  for  1000  point  data  sequences.  Thus  less  than 
15%  of  the  samples  are  used  for  updates,  as  compared  to  the  ELS  algorithm  which  updates  at 
every  sampling  instant. 

The  effect  of  different  choices  for  the  upper  bound  y2  on  the  performance  has  also  been 
studied.  For  each  value  of  y2,  the  asymptotic  average  squared  parameter  error  T,  was  computed 
over  25  runs  of  the  algorithm,  according  to  the  formula 

15 

t  =  25  2j  Q/ioco-e- iiz 
j-i 

where  9j(1000)  is  the  parameter  estimate  at  the  1000'th  iteration,  in  the  j’th  run.  The  second 
cciumn  of  Table  I  lists  the  different  values  of  T  obtained  when  y2  is  varied  from  0. 1  to  200.  It  is 
clear  that  the  algorithm  is  insensitve  to  the  value  of  y2,  since  both  the  tap  error  and  the  average 
number  of  updates  are  almost  constant.  It  was  seen  that  for  low  values  of  y2  even  though  the  true 
parameter  may  fall  outside  the  bounding  ellipsoid,  it  is  captured  in  subsequent  iterations. 

The  performance  of  the  algorithm,  when  the  noise  sequence  {w(t)}has  gaussian  distribution, 
was  evaluated  in  a  dmilar  fashion.  A  constant  value  of  y2  =25  was  used  and  the  standard  deviation 
of  the  noise  was  varied.  The  results  for  25  runs  of  the  algorithm  are  shown  in  Table  EL  It  is  clear 
that  even  though  the  unbounded  noise  does  cause  the  output  to  exceed  the  bound,  the  effect  on  the 
parameter  estimates  is  marginal. 

Finally,  the  tracking  capability  of  the  EOBE  algorithm! with  a=0.2),  was  compared  with  that  of 
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the  ELS  algorithm  (with  forgetting  factor=0.99).  The  same  model  was  used  to  generate  400  dan 
points.  The  parameters  were  then  changed  by  150%  and  the  next  400  points  were  generated. 
Finally  the  last  200  points  were  generated  by  using  the  original  parameters.  The  average  squared 
parameter  error  was  evaluated  over  25  runs  and  is  shown  in  Figure  3.  Even  though  the  formulation 
of  bounding  ellipsoids  is  based  on  the  assumption  that  the  parameters  are  constant,  the  simulation 
results  show  that  the  algorithm  is  able  to  accomodate  changes  in  model  parameters.  Analysis  of  the 
tracking  ability  of  the  algorithm  is  currendy  under  investigation. 

Example  2.  Narrow  band  ARMA  (2*2)  process 

The  output  data  (y(t)}  is  generated  by  the  following  difference  equation 

y(t)  =*  1.4  y(t-l)  -  0.95  y(t-2)  +  w(t)  -  0.86  w(t-l)  +  0.431  w(t-2) 

The  noise  sequence  is  uniformly  distributed  in  [-1. 0,1.0],  as  in  the  first  example.  The  upper 
bound  y  2  was  set  equal  to  20.0.  The  avenge  squared  parameter  errors  are  calculated  over  100  runs 
and  plotted  in  Fig. 4.  The  avenge  number  of  updates  was  35  for  1000  point  dan  sequences. 

For  this  example  too,  different  values  of  the  upper  bound  y  2  were  used  and  no  significant 
difference  in  the  quality  of  estimates,  number  of  updates  or  convergence  rate  was  observed.  Thus, 
it  is  verified  once  again  thai  a  precise  knowledge  of  the  upper  bound  is  not  a  prerequisite  for 
satisfactory  performance  of  the  algorithm. 

Example  3.  Non  SPR  ARMA(3,3)  process 

The  output  dan  (y(t)}  is  generated  by  the  following  difference  equation 

y(t)  =  -  0.6  y(t-l)  -  0.58  y(t-2)  -  0.464  y(t-3)  +  w(t)  +  0.2  w(t-l)  +  0.6w(t-2)  +  0.2  w(t-3) 

The  noise  sequence  is  genented  as  in  the  first  example.  The  upper  bound  y  2  was  set  equal  to 
3.25.  The  maximum  values  (taken  over  25  runs  of  the  algorithm),  of  the  squared  residual  errors,  at 
each  iteration  are  displayed  in  Fig  5.  Note  that  the  squared  a  posteriori  predicaon  errors  are  well 
within  the  upper  bound  y  2  even  though  the  SPR  condition  is  violated. 
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VI  CONCLUSION 

A  recursive  parameter  estimation  algorithm  has  been  extended  for  ARMA  parameter  esdmadon. 
The  main  features  of  the  algorithm  are  a  membership  set  theoredc  formulation  and  a  discerning 
update  strategy.  Convergence  analysis  of  the  algorithm  has  been  performed  under  the  assumptions 
that  the  process  is  stable  anH  that  the  noise  is  bounded.  The  main  results  of  the  analysis  are  that  the 
algorithm  yields  bounded  a  posteriori  prediction  errors  without  SPR  or  persistence  of  excitation 
type  condition.  With  a  persistence  of  excitation  condition  on  the  regressor  vector,  boundedness  of 
the  a  priori  prediction  error  can  then  be  established  and  the  parameter  estimates  are  shown  to 
converge  to  a  neighborhood  of  the  true  parameters.  Simulation  results  show  that  the  performance  of 
the  algorithm  is  comparable  to  the  ELS  algorithm  while  requiring  far  fewer  updates. 
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Lemma  A- 1  For  the  EOBE  algorithm  of  (10),  if  for  any  a  >  0,  there  cost  positive  N.  Nt ,  k 
and  'f  such  that  for  t  >  Nt 


(i)  II  9<t)  -  0(t+k)  II  <  0(e) 

GO  le(t)l  £  y,  ly(t)l  £  y 

t+N 

Cm)  2eT(i)<&a)*Ta)5i)  5  °(y2) 

l-t 

Then 

t+M 

9  T(t )  ( X  $0)  OrG)  ]  9(t)  i  0(y2  )  +  0(e2 ) 

l-i 

Proof.  Define 

[di (O.dj(t) . d^f(t)]T  =  0(t) 

(rt  (t),r2(t),  rQi.r(t)  ]T  =  <t>(t) 


Then 


t+N  tr*r 


9  Ta )  <t>a)  <t»Ta)  9(D  =  id  4(l)r(I>} 


l-t  t-l 


For  any  1  s  ( t,  t+N  ],  we  have 


(A-l) 

(A-2) 


(A-4) 


T~ir  n*r 

X  s  X  ^(Ord)  +  X,ldi(t)*di(D|  lri(D'  (A-6) 

i*  1  i-  l  i-  1 

Bv  assumption  (A-2),  !r^  (1)1  <  y.  Hence  applying  the  Schwartz  inequality  to  the  last  term  of  (A-6) 
yields 


X  ^Wr-O)  S  X  +  7(rnr)2  {  X 1  *  <*1©  ^  1 


Hence  by  assumption  (A-3),  it  follows  that 


n*r  a -r 

X  i(t)ra)  <  X  dja)r.a>  +  0(e) 


It  can  be  shown  smhlariy  that 


T-r 

X  i(l)r;(D  5  X  ^(0  r.(l)  +  0(e) 


Thus 


r 


n*r  n+r 

12,  di^r;^  -  Z  *  0(6) 

i*l  i»l 

Using  the  fact  thai  IW  -  iaJ  £  la  -  bl  for  any  a,be  R  yields 

fl*t  n+r 


l£  <*i(t)ra(0 1  5  |£  d.d)r.(Dl  +  0(e) 

t*l  »»l 


Consequently 


ti+r 


n+r 


|]T  d.(t)r.a)  t2  £  2iS  Al®ri®iZ  +20(e?) 

i»l  i-l 

And  finally  (A-4)  is  obtained  by  using  (A-l  1)  in  (A- 5). 
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(A- 10) 


(A- 11) 
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TABLE  I 


7  ? 


Upper  bound 

'J 

r 

Average  tap  error 

T 

Average  number 
of  updates 

Total  number  of 

dmes  ly(t)i  >  y 

Nr umber  of  rimes 

9*  is  out  of  ellipsoid 

0.1 

1.12  xlO’2 

154 

20207 

2935 

0J 

1.11  xlO*2 

154 

14539 

2600 

1.0 

1.09  xlOr2 

154 

10814 

6 

5.0 

1.07  xlO*2 

154 

1728 

0 

15.0 

1.08  xlO-2 

153 

12 

0 

25.0 

1.14  xlO*2 

154 

0 

0 

50.0 

1.45  xlO*2 

156 

0 

0 

200.0 

I.l5xl0-2 

156 

0 

0 

TABLE  H 


Standard  deviation 

of  noise 

Average  tap  error 

T 

Average  number 
of  updates 

Total  number  of 

dmes  ly(r)l  >  y 

Number  of  rimes 

9*  is  out  of  ellipsoid 

0.25 

0227 

139 

0 

0 

0.5 

0224 

133 

0 

0 

1.0 

0219 

12S 

436 

0 

1.5 

0224 

122 

2705 

0 

2.0 

0222 

119 

5968 

50 

3.0 

0227 

113 

10723 

1573 

21 


630 


1000 


3  £00 

Figure  2. 
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Aveng?  squared  renncrer  erecr  for  the  EOBH  and 
ELS  algorithms  -  Example  l. 
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Avenge  squared  pansear  for  the  EOBE  and 
ELS  algorirfass  -  Example  1 


Figure  5.  Maximum  residual  error  over  25  runs  for 
a  Nou-SPR  model  -  Example  3. 


t 


